Question

我有一个看起来像这样的DataFrame：

        Geo          Age     2010   2011   2012
0      toronto    -1 ~ 7       2      1     5
1      toronto     0 ~ 4       5      3     4
2      toronto     5 ~ 9       4      5     5
3      bc         -1 ~ 7       1      3     2
4      bc          0 ~ 4       2      3     1
5      bc          5 ~ 9       3      1     1
6      mt         -1 ~ 7       4      3     4
7      mt          0 ~ 4       2      2     1
8      mt          5 ~ 9       6      6     6

我想删除每个城市的-1〜7行，但是要在删除之前将值添加到0〜4行。

所需的输出：

        Geo          Age     2010   2011   2012
1      toronto     0 ~ 4       7      4     9
2      toronto     5 ~ 9       4      5     5
4      bc          0 ~ 4       3      6     3
5      bc          5 ~ 9       3      1     1
7      mt          0 ~ 4       6      5     5
8      mt          5 ~ 9       6      6     6

不在乎索引。我将更改它们。

谢谢！

Answer 1

假设您的df是有序的，则可以仅使用np.where和shift的组合，然后进行过滤

import numpy as np
import pandas as pd

df = pd.DataFrame()
df['Geo'] = ['toronto','toronto','toronto']
df['Age'] = ['-1 ~ 7','0 ~ 4','5 ~ 9']
df['2010'] = [2,5,4]


df['2010'] = np.where(df['Age']=='0 ~ 4',df['2010']+df['2010'].shift(1),df['2010'])
df = df[~(df['Age']=='-1 ~ 7')]
display(df)

    Geo     Age     2010
1   toronto 0 ~ 4   7.0
2   toronto 5 ~ 9   4.0

Answer 2

在此处创建帮助键

s=df.Age=='5 ~ 9'
yourdf=df.groupby([df.Geo,s]).agg({'Age':'last','2010':'sum','2011':'sum','2012':'sum'})
yourdf
                 Age  2010  2011  2012
Geo     Age                           
bc      False  0 ~ 4     3     6     3
        True   5 ~ 9     3     1     1
mt      False  0 ~ 4     6     5     5
        True   5 ~ 9     6     6     6
toronto False  0 ~ 4     7     4     9
        True   5 ~ 9     4     5     5

Answer 3

如果要基于该值删除数据，只需过滤数据框即可。

new_df = df[df.Age != '-1 ~ 7']
new_df

Answer 4

让我们尝试一下：

age_ind = df.index[df['Age'] == '0~4'].tolist()

for i in age_ind:
    df.at[i,['2010', '2011', '2012']]  = (df.at[i,"2010"]+df.at[i-1,"2010"]),(df.at[i,"2011"]+df.at[i-1,"2011"]),(df.at[i,"2012"]+df.at[i-1,"2012"])

df.drop(df[df.Age == '-1~7'].index)

输出：

    Geo       Age     2010  2011 2012
1   toronto     0~4     7   4   9
2   toronto     5~9     4   5   5
4   bc          0~4     3   6   3
5   bc          5~9     3   1   1
7   mt          0~4     6   5   5
8   mt          5~9     6   6   6

将一行中的值添加到下一行，并删除pandas数据框中的第一行

4 个答案: