Question

我有一个带有表的.csv文件，我按如下方式导入：

mydata = readtable('datafile1.csv');

该表有2549行，28列。这是表格的一部分，包括所有列，但有些行，举个例子：

ID          subject A                   B   C                   D       E       F       G       H   I   J    K       L  M   N                   O    P      Q   R       S   T   U   V                   W   X   Y
'sbj05100'  'sbj05' 6.22316646575928    85  -2.31806182861328   339 14  100022  'tf'    48401   100 2   2   'no'    'h' 339 322.507000000000    339 'sbj05' 100 100021  286 1   419 1.95000000000000    2   1   662
'sbj05102'  'sbj05' 7.60787820816040    65  3.00547647476196    405 17  102012  'tf'    59201   102 1   2   'yes'   'h' 405 385.367000000000    405 'sbj05' 102 102011  283 1   283 1.89000000000000    1   1   364
'sbj05104'  'sbj05' -3.71897959709167   81  3.80262303352356    429 19  104012  'tf'    66401   104 1   2   'yes'   'h' 429 408.228000000000    429 'sbj05' 104 104011  266 1   266 2.19000000000000    2   1   244
'sbj09152'  'sbj09' 0.181026369333267   88  -0.0696721449494362 87  4   152042  'tf'    12401   152 4   2   'no'    'l' 87  82.8280000000000    87  'sbj09' 152 152041  297 1   297 1.25000000000000    1   1   354
'sbj09157'  'sbj09' 0.309507131576538   116 0.226024463772774   51  2   157042  'tf'    5201    157 4   2   'no'    'l' 51  48.4870000000000    51  'sbj09' 157 157041  273 1   273 1.45000000000000    1   1   279
'sbj10151'  'sbj10' 6.99367523193359    90  4.86872243881226    345 20  151022  'tf'    70001   151 2   2   'no'    'h' 345 328.224000000000    345 'sbj10' 151 151021  198 1   198 3                   1   1   310
'sbj10167'  'sbj10' 2.25431561470032    152 -0.200379326939583  129 7   167012  'tf'    23201   167 1   2   'yes'   'h' 129 122.675000000000    129 'sbj10' 167 167011  110 1   110 2.32000000000000    2   1   276
'sbj10168'  'sbj10' 3.22731518745422    147 4.72183227539062    93  3   168042  'tf'    8801    168 4   2   'no'    'l' 93  88.3230000000000    93  'sbj10' 168 168041  179 1   179 2.38000000000000    2   1   132

我需要为每个主题（列主题）和每个条件计算列 B 和 C 列的平均值（列我）。我想获得的是：

for sbj05 column B --> cond 1 = (65+81)/2
                       cond2 = 85
          column C --> cond 1 = (3.005476475+3.802623034)/2
                       cond2 = -2.3180618
and so on...

我尝试关注此链接in matlab, calculate mean in a part of one column where another column satisfies a condition。

[R, I, J] = unique(mydata(:,2));
% count the repeating entries: now we have integer indices!
counts = accumarray(J, 1, size(R));
% sum the 2nd column for all entries
sums   = accumarray(J, mydata(:,4), size(R)); %for column B
% compute means
means  = sums./counts;

但是我收到了这个错误：

未定义函数'accumarray'用于'table'类型的输入参数。

有什么建议吗？

Answer 1

方便的是，Matlab具有计算表格统计数据的功能。因此，您可能希望使用accumarray：

而不是grpstats

meanPerSubjectAndCondition = grpstats(mydata,{'subject','I'},'mean','DataVars',{'B','C'})

计算表

1 个答案: