如何从多个csv文件中提取数据并将数据放入一个公共表中? [Matlab的]

时间:2015-11-02 22:59:44

标签: matlab csv

我有五个不同的.csv文件需要从中提取数据,在我从每个文件中提取数据后,我想将这些数据放入表中。

我的.csv目录:

2015.csv
2014.csv
2013.csv
2012.csv
2011.csv

我这样做的尝试是:

csvfiles = dir('.../*.csv');
dataArray = {}
table = table(dataArray{1:end-1}, 'VariableNames', {'WEEK','2015','2014', '2013', '2012', '2011'});

for file = csvfiles
      delimiter = ',';
      startRow = 2;
      formatSpec = '%f%f%f%f%f%f%f%f%f%f%f%[^\n\r]';

      fileID = fopen(file,'r'); % 
      dataArray = {dataArray, textscan(fileID, formatSpec, 'Delimiter', delimiter, 'HeaderLines' ,startRow-1, 'ReturnOnError', false)};
      fclose(fileID);
      extracting_data = file.column1 + file.column3
end

但是,不仅fileID采用了无效的参数file,而且我不确定如何提取数据并将其存储在表中。我可以使用file(1).name使fileID有效,但textscan()会引发错误Invalid file identifier. Use fopen to generate a valid file identifier.

基本上,我的目标是:

1. Open each file in the directory.
2. Extract all necessary data from the known columns
3. Put all 52 values inside that file into their own column (one column per file).

编辑1: 这是我的代码的更新。我将dataArray的数据结构更改为矩阵。

csvfiles = dir('/.../Data/*.csv');
data_matrix = zeros(52, 5);    % Create empty matrix and format the matrix like the table.
iter = 0;

for file = 1:numel(csvfiles)
    iter = iter + 1;
    delimiter = ',';
    startRow = 2;
    formatSpec = '%f%f%f%f%f%f%f%f%f%f%f%[^\n\r]';

    fileID = fopen(csvfiles(file).name,'r');
    data = textscan(fileID, formatSpec, 'Delimiter', delimiter, 'HeaderLines' ,startRow-1, 'ReturnOnError', false);
    fclose(fileID);

    extracted_data = file.column1 + file.column2;    % I make sure to use the column headers.

    % Add all 52 values from the extracted data to a single column.
    data_matrix[:,iter] = influenza_a;
end


%% Create output table.
Influenza = table(data_matrix{1:end-1}, 'VariableNames',{'WEEK','2015','2014','2013','2012','2011'});

1 个答案:

答案 0 :(得分:1)

我已经解释了评论%1)%5)中对代码所做的更改,现在一切都应该匹配。

%1) The path is needed again later, so its put in a seperate variable. Also
%use only two dots here.
directory='/Users/user/folder/folder/MATLAB/folder/Data';
csvfiles = dir(fullfile(directory,'*.csv'));

%2) numel(csvfiles) to avoid unnecessary constants.
data_matrix = zeros(52, numel(csvfiles));    % Create empty matrix and format the matrix like the table.
iter = 0;

for file = 1:numel(csvfiles)
    iter = iter + 1;
    delimiter = ',';
    startRow = 2;
    formatSpec = '%f%f%f%f%f%f%f%f%f%f%f%[^\n\r]';
    %3) Use absolute path here, otherwise file is not found
    filename= fullfile(directory,csvfiles(file).name);
    fileID = fopen(filename,'r');
    %4) Inserted error check
    if fileID<0
        error('failed to open file %s',filename)
    end
    data = textscan(fileID, formatSpec, 'Delimiter', delimiter, 'HeaderLines' ,startRow-1, 'ReturnOnError', false);
    fclose(fileID);

    extracted_data = file.column1 + file.column2;    % I make sure to use the column headers.

    % Add all 52 values from the extracted data to a single column.
    %5) Indexing was wrong, should be right this way:
    data_matrix(:,file) = influenza_a;
end


%% Create output table.
Influenza = table(data_matrix{1:end-1}, 'VariableNames',{'WEEK','2015','2014','2013','2012','2011'});