Question

我是一名研究化学家，并进行了测量，我记录了信号强度＆＃39; vs＆＃39;质量充电（ m / z ）＆＃39; 。我通过改变特定参数（碰撞能量）重复了这个实验15次。因此，我有15个CSV文件，并希望在 m / z 值和相同间隔值的相同范围内对齐/连接它们。由于仪器阈值规则，某些 m / z 值未被记录，因此我的文件无法简单地导出到Excel中并进行复制/粘贴。数据看起来有点像下面发布的表格

Cannot declare class error, because the name is already in use in [...]

使用matlab我开始使用这段代码：

Dataset 1:  x  |  y          Dataset 2:   x  | y
           ---------                    ---------       
            0.0   5                      0.0   2
            0.5   3                      0.5   6
            2.0   7                      1.0   9
            3.0   1                      2.5   1
                                         3.0   4

然后我手动导入1个X / Y CSV（Xtitle = XThompson，Ytitle = YCounts）以与指定的 m / z 范围对齐。

%% Create a table for the set m/z range with an interval of 0.1 Da
mzrange = 50:0.1:620;
mzrange = mzrange';
mzrange = array2table(mzrange,'VariableNames',{'XThompsons'});

此时我陷入困境，因为使用单独的文件重复此过程将覆盖我的YCounts列。 YCounts列的标题对我来说无关紧要，因为我可以稍后更改它，但是我想让表继续这样：

%% Join/merge the two tables using a common Key variable 'XThompson' (m/z value)
mzspectrum = outerjoin(mzrange,ReserpineCE00,'MergeKeys',true);

% Replace all NaN values with zero
mzspectrum.YCounts(isnan(mzspectrum.YCounts)) = 0;

如何在Matlab中实现这一点，这至少是半自动化的？我之前发布过一篇描述类似情况的文章，但事实证明它无法实现我所需要的。我必须承认，我的思想不是程序员，所以我一直在努力解决这个问题。

PS-这个问题最好在Matlab或Python中执行吗？

Answer 1

我不知道或使用matlab所以我的答案是基于纯Python的。我认为python和matlab应该同样适合读取csv文件并生成主表。

请将此答案视为指向如何在python中解决问题的指针。

在python中，人们通常会使用pandas包来解决这个问题。该软件包提供“高性能，易于使用的数据结构和数据分析工具”，可以本地读取大量文件格式，包括CSV文件。可以生成来自两个CSV文件“foo.csv”和“bar.csv”的主表格，例如如下：

import pandas as pd
df = pd.read_csv('foo.csv')
df2 = pd.read_csv('bar.cvs')

master_table = pd.concat([df, df2])

Pandas还允许以多种方式对数据进行分组和构建。 pandas documentation对其各种功能有很好的描述。

可以使用python包安装程序pip安装pandas：

sudo pip install pandas

如果在Linux或OSX上。

Answer 2

来自不同分析的计数应该以不同的名称命名，即，在加入它们之前，不同数据集中分别来自分析1,2和3的YCounts_1，YCounts_2和YCounts_3。但是，M / Z名称（即XThompson）应该相同，因为这是用于连接数据集的键。以下代码适用于MATLAB。

不需要此步骤（只是重新创建表格），我复制了dataset2以创建数据集3以进行说明。你可以使用＆＃39; readtable＆＃39;导入您的数据，即，imported_data = readtable（＆＃39; filename＆＃39;）;

  dataset1 = table([0.0; 0.5; 2.0; 3.0], [5; 3; 7; 1], 'VariableNames', {'XThompson', 'YCounts_1'});
  dataset2 = table([0.0; 0.5; 1.0; 2.5; 3.0], [2; 6; 9; 1; 4], 'VariableNames', {'XThompson', 'YCounts_2'});
  dataset3 = table([0.0; 0.5; 1.0; 2.5; 3.0], [2; 6; 9; 1; 4], 'VariableNames', {'XThompson', 'YCounts_3'});

使用outerjoin合并表。如果您有许多数据集，则可以使用循环。

  combined_dataset = outerjoin(dataset1,dataset2, 'MergeKeys', true);

将dataset3添加到combined_dataset

  combined_dataset = outerjoin(combined_dataset,dataset3, 'MergeKeys', true);

您可以使用可写

将组合数据导出为Excel Sheet

  writetable(combined_dataset, 'joined_icp_ms_data.xlsx');

Answer 3

我设法基于学习每个人的输入并参加在线matlab课程来创建我的问题的解决方案。我不是一个自然的编码器，所以我的剧本不如这里的天才那么优雅，但希望它足以让其他非编程科学家使用。

这是对我有用的结果：

％读取包含* .csv文件的目录，并将x轴校正为均匀间隔（0.1单位）的间隔。

% Create a matrix with the input x range then convert it to a table
prompt = 'Input recorded min/max data range separated by space \n(ex. 1 to 100 = 1 100): ';
inputrange = input(prompt,'s');
min_max = str2num(inputrange)
datarange = (min_max(1):0.1:min_max(2))';
datarange = array2table(datarange,'VariableNames',{'XAxis'});

files = dir('*.csv');
for q=1:length(files);

    % Extract each XY pair from the csvread cell and convert it to an array, then back to a table.
    data{q} = csvread(files(q).name,2,1); 
    data1 = data(q);
    data2 = cell2mat(data1);
    data3 = array2table(data2,'VariableNames',{'XAxis','YAxis'});

    % Join the datarange table and the intensity table to obtain an evenly spaced m/z range
    data3 = outerjoin(datarange,data3,'MergeKeys',true);
    data3.YAxis(isnan(data3.YAxis)) = 0;
    data3.XAxis = round(data3.XAxis,1);

    % Remove duplicate values
    data4 = sortrows(data3,[1 -2]);
    [~, idx] = unique(data4.XAxis);
    data4 = data4(idx,:);

    % Save the file as the same name in CSV without underscores or dashes
    filename = files(q).name;
    filename = strrep(filename,'_','');
    filename = strrep(filename,'-','');
    filename = strrep(filename,'.csv','');
    writetable(data4,filename,'FileType','text');
    clear data data1 data2 data3 data4 filename

end
clear

如何导入多个CSV文件然后制作主表？

3 个答案: