Question

我想在csv文件上使用np.cumsum()来获取基于57个不同id的数据列，这些id由一个单独的列表示。我的文件看起来像这样：

station_id     year           Value
210018         1910            1
210018         1911            6
210018         1912            3
210019         1910            2
210019         1911            4
210019         1912            7

我希望我的输出看起来像这样：

station_id     year           Value
210018         1910            1
210018         1911            7
210018         1912            10
210019         1910            2
210019         1911            6
210019         1912            13

我目前正在使用此代码，我的初始文件名为df：

df.groupby(['station_id']).apply(lambda x: np.cumsum(['Value']))

返回：

TypeError: cannot perform accumulate with flexible type

任何帮助都将不胜感激。

Answer 1

np.cumsum(['Value'])，一直都是，提出

TypeError: cannot perform accumulate with flexible type

（np.cumsum期望数值数组作为其第一个参数，而不是字符串列表。）而是使用：

values = df.groupby(['station_id'])['Value'].cumsum()

或者，您可以直接修改df['Value']：

In [75]: df['Value'] = df.groupby(['station_id'])['Value'].cumsum()

In [76]: df
Out[76]: 
   station_id  year  Value
0      210018  1910      1
1      210018  1911      7
2      210018  1912     10
3      210019  1910      2
4      210019  1911      6
5      210019  1912     13

在分组的csv文件上使用np.cumsum

1 个答案: