我有以下熊猫数据框
code tank prod_receipt tank_prod
12345 1 MS MS
23452 2 MS No Data
23333 2 HS HS
14567 3 MS No Data
12343 2 MS MS
我想生成一个标志,在其中检查prod_receipt
是否等于tank_prod
我想要的数据帧是
code tank prod_receipt tank_prod Flag
12345 1 MS MS Equal
23452 2 MS No Data No Data
23333 2 HS HS Equal
14567 3 MS No Data No Data
12343 2 MS HS Not Equal
我怎么在熊猫里做?
答案 0 :(得分:4)
不要使用循环,因为慢,更好的方法是使用numpy.select
:
m1 = df['tank_prod'] == 'No Data'
m2 = df['prod_receipt'] == df['tank_prod']
df['new'] = np.select([m1, m2], ['No Data', 'Equal'],'Not Equal')
print (df)
code tank prod_receipt tank_prod new
0 12345 1 MS MS Equal
1 23452 2 MS No Data No Data
2 23333 2 HS HS Equal
3 14567 3 MS No Data No Data
4 12343 2 MS HS Not Equal
如果只需要一种情况,请使用numpy.where
:
m2 = df['prod_receipt'] == df['tank_prod']
df['new'] = np.where(m2, 'Equal','Not Equal')
print (df)
code tank prod_receipt tank_prod new
0 12345 1 MS MS Equal
1 23452 2 MS No Data Not Equal
2 23333 2 HS HS Equal
3 14567 3 MS No Data Not Equal
4 12343 2 MS HS Not Equal
性能:
取决于行数和匹配值的数目:
#4k rows
df = pd.concat([df] * 1000, ignore_index=True)
In [90]: %%timeit
...: m1 = df['tank_prod'] == 'No Data'
...: m2 = df['prod_receipt'] == df['tank_prod']
...: df['new'] = np.select([m1, m2], ['No Data', 'Equal'],'Not Equal')
...:
2.89 ms ± 64.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
#loop solution
In [91]: %%timeit
...: df["Flag"] = df.apply(lambda x: "Equal" if x["prod_receipt"] == x["tank_prod"] else ("Not Equal" if x["prod_receipt"] != x["tank_prod"] and x["tank_prod"] != "No Data" else "No Data"), axis =1)
...:
278 ms ± 7.04 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
答案 1 :(得分:0)
只需执行以下操作:
flag_dict={
True: 'Equal',
False: 'Not Equal'
}
df['flag'] = df['prod_receipt']== df['tank_prod']
df['flag'] = df['flag'].apply(lambda row: flag_dict[row] )
答案 2 :(得分:0)
使用.apply()
df["Flag"] = df.apply(lambda x: "Equal" if x["prod_receipt"] == x["tank_prod"] else ("Not Equal" if x["prod_receipt"] != x["tank_prod"] and x["tank_prod"] != "No Data" else "No Data"), axis =1)
输出:
code tank prod_receipt tank_prod Flag
0 12345 1 MS MS Equal
1 23452 2 MS No Data No Data
2 23333 2 HS HS Equal
3 14567 3 MS No Data No Data
4 12343 2 MS HS Not Equal