我有以下数据框:
import pandas as pd
import numpy as np
df_Station = pd.DataFrame({'ID': [1024, 1024, 1024, 1024, 1024, 1024,1000, 2000],
'Code_Instrumentation': ['BA182', 'MED1', 'MED1', 'MED1',
'500-01', '500-01', '500-04', '500-01']})
我想计算重复使用'ID'和'Code_Instrumentation'的项目数,并用这个总和在数据框(New_column)上添加一个新列。
我试图实现以下代码。但是,这是不正确的:
# Create a list of all IDs
list_Code_Instrumentation = np.array(df_Station['Code_Instrumentation'])
df_Station['New_Column'] = 0
cont = 0
for i in range(0, len(list_Code_Instrumentation)):
aux = list_Code_Instrumentation[i]
if(df_Station['Code_Instrumentation'].iloc[i] == aux):
for j in range(0,len(df_Station)-1):
if(df_Station['Code_Instrumentation'].iloc[j] ==
df_Station['Code_Instrumentation'].iloc[j+1]):
cont += 1
df_Station['New_Column'].loc[j] = cont
cont = 0
此代码将导致以下(错误的)输出:
ID Code_Instrumentation New_Column
1024 BA182 0
1024 MED1 1
1024 MED1 1
1024 MED1 0
1024 500-01 1
1024 500-01 0
1000 500-04 0
2000 500-01 0
所需的输出将是:
ID Code_Instrumentation New_Column
1024 BA182 1 #ID=1024 with Code_Instrumentation=BA182 appeared once
1024 MED1 3 #ID=1024 with Code_Instrumentation=MED1 appeared three time
1024 MED1 3
1024 MED1 3
1024 500-01 2 #ID=1024 with Code_Instrumentation=500-01 appeared twice
1024 500-01 2
1000 500-04 1
2000 500-01 1
答案 0 :(得分:1)
将groupby与transform组合在一起,以获得每个分组的size
df_Station['New_Column'] = df_Station.groupby(["ID","Code_Instrumentation"]).Code_Instrumentation.transform("size")
ID Code_Instrumentation New_Column
0 1024 BA182 1
1 1024 MED1 3
2 1024 MED1 3
3 1024 MED1 3
4 1024 500-01 2
5 1024 500-01 2
6 1000 500-04 1
7 2000 500-01 1
答案 1 :(得分:0)
Grouby和转换:
df_Station['New_Column'] = df_Station.groupby(['ID','Code_Instrumentation'])['ID'].transform('count')
ID Code_Instrumentation New_Column
0 1024 BA182 1
1 1024 MED1 3
2 1024 MED1 3
3 1024 MED1 3
4 1024 500-01 2
5 1024 500-01 2
6 1000 500-04 1
7 2000 500-01 1