Question

想要区分连接到特定建筑物的特定电梯（由高度表示）的每个设备（共有四种类型）。

计算他们的测试结果，即在任何特定日期的测试总数中失败（NG）的次数。 [完成]
由于设备没有唯一的ID，因此想要识别它们并为每个设备分配唯一的ID。 [??]

原始数据框看起来像这样

BldgID  BldgHt    Device   Date      Result

  1074    34.0    790C     2018/11/20   OK
  1072    31.0    780      2018/11/19   NG
  1072    36.0    780      2018/11/19   OK
  1074     7.0    790C     2018/11/19   OK
  1074    10.0    780      2018/11/19   OK
  1076    17.0    780      2018/11/20   NG
  1079    12.0    780      2018/11/20   NG
  1070    27.0    780      2018/11/18   OK
  1073    16.0    780      2018/11/19   OK
  1074    31.0    790C     2018/11/20   OK
# Find the number of NG
df1 = mel_df.groupby(['BldgID','BldgHt','Device','Date'])\
    ['Result'].apply(lambda x : (x=='NG').sum()).round(2).reset_index()

mel_df1['NG'] = mel_df1['Result']

# Find the total number (ALL= OK + NG)
df2 = mel_df.groupby(['BldgID','BldgHt','Device','Date'])\ 
    ['Result'].count().round(2).reset_index()

df2['ALL'] = mel_df2['Result']

# print 'NG' and 'ALL' columns side by side. 

    BldgID  BldgHt    Device   Date        NG  ALL
0  1074    34.0       790C     2018/11/20   0    2
1  1072    31.0       780      2018/11/19   1    3
2  1072    36.0       780      2018/11/19   0    3
3  1074     7.0       790C     2018/11/19   0    1
4  1074    10.0       780      2018/11/19   0    1

Then filter out when NG == 0, that is only when it fails.

mel_df2 = mel_df2[mel_df2.NG != 0]
print(mel_df2.head(6))

    BldgID   BldgHt  Device   Date        NG  ALL
1   1072    31.0     780      2018/11/19   1    3
5   1076    17.0     780      2018/11/20   2    3
24  1068    16.0     780      2018/11/18   1    4
35  1077    39.0     780      2018/11/20   2    4
67  1074    36.0     780      2018/11/19   2    8
68  1074    39.0     780      2018/11/19   1    6

Now I want to assign new unique IDs to each values, combining first 
columns. So it should look like 

New_ID   Date        NG  ALL 
001      2018/11/19  1   3
002      2018/11/18  2   4
003      2018/10/20  2   6

任何提示将不胜感激。

Answer 1

使用：

#aggregate both aggregate function only in once groupby
df1 = mel_df.groupby(['BldgID','BldgHt','Device','Date'])\
    ['Result'].agg([('NG', lambda x :(x=='NG').sum()), ('ALL','count')]).round(2).reset_index()

#filter non 0 rows
mel_df2 = df1[df1.NG != 0]

#filter first rows by Date
mel_df2 = mel_df2.drop_duplicates('Date')

#create New_ID by insert with Series with zero fill 3 values
s = pd.Series(np.arange(1, len(mel_df2) + 1), index=mel_df2.index).astype(str).str.zfill(3)
mel_df2.insert(0, 'New_ID', s)

问题数据的输出：

print (mel_df2)
  New_ID  BldgID  BldgHt Device        Date  NG  ALL
1    001    1072    31.0    780  2018/11/19   1    1
8    002    1076    17.0    780  2018/11/20   1    1

如何将多个列统一（折叠）为一个分配唯一值的字段

1 个答案: