想要区分连接到特定建筑物的特定电梯(由高度表示)的每个设备(共有四种类型)。
计算他们的测试结果,即在任何特定日期的测试总数中失败(NG)的次数。 [完成]
由于设备没有唯一的ID,因此想要识别它们并为每个设备分配唯一的ID。 [??]
原始数据框看起来像这样
BldgID BldgHt Device Date Result
1074 34.0 790C 2018/11/20 OK
1072 31.0 780 2018/11/19 NG
1072 36.0 780 2018/11/19 OK
1074 7.0 790C 2018/11/19 OK
1074 10.0 780 2018/11/19 OK
1076 17.0 780 2018/11/20 NG
1079 12.0 780 2018/11/20 NG
1070 27.0 780 2018/11/18 OK
1073 16.0 780 2018/11/19 OK
1074 31.0 790C 2018/11/20 OK
# Find the number of NG
df1 = mel_df.groupby(['BldgID','BldgHt','Device','Date'])\
['Result'].apply(lambda x : (x=='NG').sum()).round(2).reset_index()
mel_df1['NG'] = mel_df1['Result']
# Find the total number (ALL= OK + NG)
df2 = mel_df.groupby(['BldgID','BldgHt','Device','Date'])\
['Result'].count().round(2).reset_index()
df2['ALL'] = mel_df2['Result']
# print 'NG' and 'ALL' columns side by side.
BldgID BldgHt Device Date NG ALL
0 1074 34.0 790C 2018/11/20 0 2
1 1072 31.0 780 2018/11/19 1 3
2 1072 36.0 780 2018/11/19 0 3
3 1074 7.0 790C 2018/11/19 0 1
4 1074 10.0 780 2018/11/19 0 1
Then filter out when NG == 0, that is only when it fails.
mel_df2 = mel_df2[mel_df2.NG != 0]
print(mel_df2.head(6))
BldgID BldgHt Device Date NG ALL
1 1072 31.0 780 2018/11/19 1 3
5 1076 17.0 780 2018/11/20 2 3
24 1068 16.0 780 2018/11/18 1 4
35 1077 39.0 780 2018/11/20 2 4
67 1074 36.0 780 2018/11/19 2 8
68 1074 39.0 780 2018/11/19 1 6
Now I want to assign new unique IDs to each values, combining first
columns. So it should look like
New_ID Date NG ALL
001 2018/11/19 1 3
002 2018/11/18 2 4
003 2018/10/20 2 6
任何提示将不胜感激。
答案 0 :(得分:2)
使用:
#aggregate both aggregate function only in once groupby
df1 = mel_df.groupby(['BldgID','BldgHt','Device','Date'])\
['Result'].agg([('NG', lambda x :(x=='NG').sum()), ('ALL','count')]).round(2).reset_index()
#filter non 0 rows
mel_df2 = df1[df1.NG != 0]
#filter first rows by Date
mel_df2 = mel_df2.drop_duplicates('Date')
#create New_ID by insert with Series with zero fill 3 values
s = pd.Series(np.arange(1, len(mel_df2) + 1), index=mel_df2.index).astype(str).str.zfill(3)
mel_df2.insert(0, 'New_ID', s)
问题数据的输出:
print (mel_df2)
New_ID BldgID BldgHt Device Date NG ALL
1 001 1072 31.0 780 2018/11/19 1 1
8 002 1076 17.0 780 2018/11/20 1 1