Skip to content
Permalink
main
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Go to file
 
 
Cannot retrieve contributors at this time
function [Xclean, yclean, pclean, ynan, p] = CleanUpData_lag(X, y,...
dataset, X_people_row, lag)
% This function is meant to clean up and lag data by a certain amount
% of days. It assumes that missing data is denoted by NaN and that all
% data read in are of numeric type.
% It also cleans the whole dataset that is fed into it.
N = size(X,1);
Xclean = X;
yclean = y;
pclean = X_people_row;
p = X_people_row;
Xclean = [];
yclean = [];
pclean = [];
ynan = [];
j = 0;
unique_people = unique(X_people_row);
for per = 1:length(unique_people)
person = unique_people(per);
% Initialize the current person's cleaned dataset
current_Xclean = [];
current_yclean = [];
current_pclean = [];
current_X = X(X_people_row == person,:);
current_y = y(X_people_row==person,:);
current_ynan = NaN(size(current_y));
[N,m] = size(current_X);
k = 0;
j = 0;
% Iterate through the dataset starting at one more than the lag
for i=(lag+1):N
cond = [];
% Iterate from 1 to the lag and see if there are any nulls in
% that row
for lag_i=1:lag
cond = [cond;
isempty(find(isnan(current_X(i-lag_i,:)), 1))];
end
% If there are no nulls in the lagged days, and there is not
% a null in the current behavior, move forward
if all([cond;
isempty(find(isnan(current_y(i,1)), 1))])
j = j + 1;
k = k + 1;
current_X_placeholder = [];
for lag_j=1:lag
current_X_placeholder = [current_X_placeholder ...
current_X(i-lag_j,:)];
end
current_Xclean(j,:) = current_X_placeholder;
current_pclean(j,1) = person;
current_yclean(j,1) = current_y(i,1);
current_ynan(i,1) = current_y(i,1);
end
end
ynan = [ynan; current_ynan];
% Calculate the class imbalance on the cleaned dataset by amount of
% behaviors over total days of behavior
imbal = sum(current_yclean) / length(current_yclean);
% Get the maximum of the class imbalance
max_imbal = max([imbal, (1 - imbal)]);
% If class imbalance is higher than 90%, move on
if max_imbal > 0.9
continue
end
% If there are less than 20 data points, also move on
if k < 20
continue
end
Xclean = [Xclean; current_Xclean];
yclean = [yclean; current_yclean];
pclean = [pclean; current_pclean]
end
end