我有一个像这样的数据集:
time(secs) setup
40 setup1
30 setup1
20 setup1
10 setup2
20 setup2
10 setup1
30 setup1
30 setup2
40 setup2
10 setup3
20 setup3
我想根据类似的pandas dataframe
值来获取setup
中的行总和:
time(secs) setup
90 setup1
30 setup2
40 setup1
70 setup2
30 setup3
但是通过使用groupby()
函数:
df.groupby(['setup']).sum()
我得到的结果是:
setup time
setup1 130
setup2 100
setup3 30
请帮助解决此问题...
谢谢!!!
答案 0 :(得分:1)
分组并通过助手sum
和cumsum
与shift
和Series.ne
first
比较Series
和助手(!=)
与助手df1 = (df.groupby(df['setup'].ne(df['setup'].shift()).cumsum(), as_index=False)
.agg({'time(secs)':'sum', 'setup':'first'}))
print (df1)
time(secs) setup
0 90 setup1
1 30 setup2
2 40 setup1
3 70 setup2
4 30 setup3
print (df['setup'].ne(df['setup'].shift()).cumsum())
0 1
1 1
2 1
3 2
4 2
5 3
6 3
7 4
8 4
9 5
10 5
Name: setup, dtype: int32
详细信息:
df['groups'] = df['setup'].ne(df['setup'].shift()).cumsum()
print (df)
time(secs) setup groups
0 40 setup1 1
1 30 setup1 1
2 20 setup1 1
3 10 setup2 2
4 20 setup2 2
5 10 setup1 3
6 30 setup1 3
7 30 setup2 4
8 40 setup2 4
9 10 setup3 5
10 20 setup3 5
df1 = (df.groupby('groups')
.agg({'time(secs)':'sum', 'setup':'first'})
.reset_index(drop=True))
与新列相似的解决方案:
df1 = (df.groupby(['groups', 'setup'])['time(secs)'].sum()
.reset_index(level=0, drop=True)
.reset_index())
print (df1)
time(secs) setup
0 90 setup1
1 30 setup2
2 40 setup1
3 70 setup2
4 30 setup3
let group = [
{
id: 1,
name: 'Test 1',
geo: 'Japan',
car: 'Toyota'
},
{
id: 2,
name: 'Test 2',
geo: 'USA',
car: 'Tesla'
},
{
id: 3,
name: 'Test 3',
geo: 'Germany',
car: 'Audi'
}
];