我有一个像这样的pandas数据框:
df = pd.DataFrame({'a1':['astr1','jmtr2','astr2','mmsk3',
'astr6','jmtr2','astr2','mhhk',
'astr5','mmsk','astr6','astr1',
'mstr1','mhhk','mstr2','mhhk'],
'a2':[x for x in np.random.randn(16)]})
df
a1 a2
0 astr1 -0.490416
1 jmtr2 0.651627
2 astr2 0.784004
3 mmsk3 -1.595870
4 astr6 1.228631
5 jmtr2 -1.644518
6 astr2 -0.311709
7 mhhk -1.284221
8 astr5 -0.356339
9 mmsk -0.071046
10 astr6 1.620838
11 astr1 -0.717384
12 mstr1 0.830618
13 mhhk -0.020226
14 mstr2 -0.056465
15 mhhk -0.160234
如果前四个字母相同,我现在要做的就是合并a1
。同时,应添加a2
的值。
像这样:
a1 a2
0 astr $sum of astr$
1 jmtr $sum of jmtr$
2 mmsk $sum of mmsk$
3 mhhk $sum of mhhk$
4 mstr $sum of mstr$
答案 0 :(得分:4)
我认为您需要4
a1
个print (df.a1.str[:4])
0 astr
1 jmtr
2 astr
3 mmsk
4 astr
5 jmtr
6 astr
7 mhhk
8 astr
9 mmsk
10 astr
11 astr
12 mstr
13 mhhk
14 mstr
15 mhhk
Name: a1, dtype: object
print (df.a2.groupby(df.a1.str[:4]).sum().reset_index())
a1 a2
0 astr 1.112200
1 jmtr -1.559358
2 mhhk 1.113222
3 mmsk -0.023918
4 mstr -2.526466
个groupby
和indexing with str汇总sum
:
Arrays.stream("b,l,a".split(","))