如何将多个列统一(折叠)为一个分配唯一值的字段

时间:2019-07-27 15:59:43

标签: python pandas

想要区分连接到特定建筑物的特定电梯(由高度表示)的每个设备(共有四种类型)。

  1. 计算他们的测试结果,即在任何特定日期的测试总数中失败(NG)的次数。 [完成]

  2. 由于设备没有唯一的ID,因此想要识别它们并为每个设备分配唯一的ID。 [??]

原始数据框看起来像这样

BldgID  BldgHt    Device   Date      Result

  1074    34.0    790C     2018/11/20   OK
  1072    31.0    780      2018/11/19   NG
  1072    36.0    780      2018/11/19   OK
  1074     7.0    790C     2018/11/19   OK
  1074    10.0    780      2018/11/19   OK
  1076    17.0    780      2018/11/20   NG
  1079    12.0    780      2018/11/20   NG
  1070    27.0    780      2018/11/18   OK
  1073    16.0    780      2018/11/19   OK
  1074    31.0    790C     2018/11/20   OK
# Find the number of NG
df1 = mel_df.groupby(['BldgID','BldgHt','Device','Date'])\
    ['Result'].apply(lambda x : (x=='NG').sum()).round(2).reset_index()

mel_df1['NG'] = mel_df1['Result']

# Find the total number (ALL= OK + NG)
df2 = mel_df.groupby(['BldgID','BldgHt','Device','Date'])\ 
    ['Result'].count().round(2).reset_index()

df2['ALL'] = mel_df2['Result']

# print 'NG' and 'ALL' columns side by side. 

    BldgID  BldgHt    Device   Date        NG  ALL
0  1074    34.0       790C     2018/11/20   0    2
1  1072    31.0       780      2018/11/19   1    3
2  1072    36.0       780      2018/11/19   0    3
3  1074     7.0       790C     2018/11/19   0    1
4  1074    10.0       780      2018/11/19   0    1

Then filter out when NG == 0, that is only when it fails.

mel_df2 = mel_df2[mel_df2.NG != 0]
print(mel_df2.head(6))

    BldgID   BldgHt  Device   Date        NG  ALL
1   1072    31.0     780      2018/11/19   1    3
5   1076    17.0     780      2018/11/20   2    3
24  1068    16.0     780      2018/11/18   1    4
35  1077    39.0     780      2018/11/20   2    4
67  1074    36.0     780      2018/11/19   2    8
68  1074    39.0     780      2018/11/19   1    6

Now I want to assign new unique IDs to each values, combining first 
columns. So it should look like 

New_ID   Date        NG  ALL 
001      2018/11/19  1   3
002      2018/11/18  2   4
003      2018/10/20  2   6 

任何提示将不胜感激。

1 个答案:

答案 0 :(得分:2)

使用:

#aggregate both aggregate function only in once groupby
df1 = mel_df.groupby(['BldgID','BldgHt','Device','Date'])\
    ['Result'].agg([('NG', lambda x :(x=='NG').sum()), ('ALL','count')]).round(2).reset_index()

#filter non 0 rows
mel_df2 = df1[df1.NG != 0]

#filter first rows by Date
mel_df2 = mel_df2.drop_duplicates('Date')

#create New_ID by insert with Series with zero fill 3 values
s = pd.Series(np.arange(1, len(mel_df2) + 1), index=mel_df2.index).astype(str).str.zfill(3)
mel_df2.insert(0, 'New_ID', s)

问题数据的输出:

print (mel_df2)
  New_ID  BldgID  BldgHt Device        Date  NG  ALL
1    001    1072    31.0    780  2018/11/19   1    1
8    002    1076    17.0    780  2018/11/20   1    1