如何在数据框中创建一个新列,并在其他列中包含相等项

时间:2020-06-03 20:46:21

标签: pandas dataframe

我有以下数据框:

      import pandas as pd
      import numpy as np    

      df_Station = pd.DataFrame({'ID': [1024, 1024, 1024, 1024, 1024, 1024,1000, 2000],
                                 'Code_Instrumentation': ['BA182', 'MED1', 'MED1', 'MED1',
                                                          '500-01', '500-01', '500-04', '500-01']})

我想计算重复使用'ID'和'Code_Instrumentation'的项目数,并用这个总和在数据框(New_column)上添加一个新列。

我试图实现以下代码。但是,这是不正确的:

     # Create a list of all IDs
     list_Code_Instrumentation = np.array(df_Station['Code_Instrumentation'])


      df_Station['New_Column'] = 0
      cont = 0

      for i in range(0, len(list_Code_Instrumentation)):    
          aux = list_Code_Instrumentation[i]    

          if(df_Station['Code_Instrumentation'].iloc[i] == aux):


               for j in range(0,len(df_Station)-1):

                  if(df_Station['Code_Instrumentation'].iloc[j] ==
                     df_Station['Code_Instrumentation'].iloc[j+1]):

                       cont += 1

                  df_Station['New_Column'].loc[j] = cont
                  cont = 0

此代码将导致以下(错误的)输出:

      ID    Code_Instrumentation    New_Column
     1024          BA182             0
     1024          MED1              1
     1024          MED1              1
     1024          MED1              0
     1024          500-01            1
     1024          500-01            0
     1000          500-04            0
     2000          500-01            0

所需的输出将是:

      ID    Code_Instrumentation    New_Column
     1024          BA182                1 #ID=1024 with Code_Instrumentation=BA182 appeared once
     1024          MED1                 3 #ID=1024 with Code_Instrumentation=MED1 appeared three time
     1024          MED1                 3
     1024          MED1                 3
     1024          500-01               2 #ID=1024 with Code_Instrumentation=500-01 appeared twice
     1024          500-01               2
     1000          500-04               1
     2000          500-01               1

2 个答案:

答案 0 :(得分:1)

groupbytransform组合在一起,以获得每个分组的size

df_Station['New_Column'] = df_Station.groupby(["ID","Code_Instrumentation"]).Code_Instrumentation.transform("size")

ID  Code_Instrumentation    New_Column
0   1024    BA182               1
1   1024    MED1                3
2   1024    MED1                3
3   1024    MED1                3
4   1024    500-01              2
5   1024    500-01              2
6   1000    500-04              1
7   2000    500-01              1

答案 1 :(得分:0)

Grouby和转换:

df_Station['New_Column'] = df_Station.groupby(['ID','Code_Instrumentation'])['ID'].transform('count')


ID  Code_Instrumentation    New_Column
0   1024    BA182   1
1   1024    MED1    3
2   1024    MED1    3
3   1024    MED1    3
4   1024    500-01  2
5   1024    500-01  2
6   1000    500-04  1
7   2000    500-01  1