我有问题: 我有数据(约300k行)
Date Column 1 Column 2 Column 3 Value
1/1/2017 A Jonas Station1 8
1/1/2017 A Greg Station1 5
1/1/2017 A Anton Station1 1
1/1/2017 B Jonas Station1 4
1/1/2017 B Greg Station1 4
1/1/2017 B Mick Station2 8
1/1/2017 B Anton Station3 7
1/1/2017 C Jonas Station4 2
1/1/2017 C Greg Station1 7
我需要计算下一步: 每个日期,按第2列,按Column3: 我有一个例子'Greg':
Date Column 1 Column 2 Column 3 Value
1/1/2017 A Greg Station1 5
1/1/2017 B Greg Station1 4
1/1/2017 C Greg Station1 7
对所有C / Count(A,B)求和,对于此示例7/2 = 3.5 为每行添加值“3.5”。
Date Column 1 Column 2 Column 3 Value
1/1/2017 A Greg Station1 5+3.5 = 8.5
1/1/2017 B Greg Station1 4+3.5 = 7.5
如何在pandas / python中制作这个? 更新:项目像'Greg'我有~100,天~500和Column1,Column2~1k
答案 0 :(得分:1)
使用:
#filter values A, B
df1 = df[df['Column 1'].isin(['A','B'])]
#get count per groups, reindex by original index for original size
b = (df1.groupby(['Date','Column 2', 'Column 3'])['Column 1']
.transform('size')
.reindex(df.index))
#filter value C and rename for merge
df2 = df.loc[df['Column 1'] == 'C', ['Date','Column 2','Column 3','Value']]
.rename(columns={'Value':'a'})
#merge to original a select only new column a
a = pd.merge(df, df2, 'left')['a']
#divide and add column Value - if divide return NaN let original values
df['Value'] = a.div(b).add(df['Value'], fill_value=0)
print (df)
Date Column 1 Column 2 Column 3 Value
0 1/1/2017 A Jonas Station1 8.0
1 1/1/2017 A Greg Station1 8.5
2 1/1/2017 A Anton Station1 1.0
3 1/1/2017 B Jonas Station1 4.0
4 1/1/2017 B Greg Station1 7.5
5 1/1/2017 B Mick Station2 8.0
6 1/1/2017 B Anton Station3 7.0
7 1/1/2017 C Jonas Station4 2.0
8 1/1/2017 C Greg Station1 7.0
使用自定义功能应该更慢:
def f(x):
a = x.loc[x['Column 1'] == 'C', 'Value']
#mask of all A, B values per group
m = x['Column 1'].isin(['A','B'])
c = a / m.sum()
#if-else, because if no C exist per group get empty Series
c = 0 if c.empty else c.item()
x.loc[b, 'Value'] +=c
return x
df = df.groupby(['Date','Column 2', 'Column 3']).apply(f)
print (df)
Date Column 1 Column 2 Column 3 Value
0 1/1/2017 A Jonas Station1 8.0
1 1/1/2017 A Greg Station1 8.5
2 1/1/2017 A Anton Station1 1.0
3 1/1/2017 B Jonas Station1 4.0
4 1/1/2017 B Greg Station1 7.5
5 1/1/2017 B Mick Station2 8.0
6 1/1/2017 B Anton Station3 7.0
7 1/1/2017 C Jonas Station4 2.0
8 1/1/2017 C Greg Station1 7.0