我有一个如下所示的数据框,我需要生成一个名为“ Comment”的新列,对于指定的值,它应该显示为“ Fail”
输入:
initPaginator(){
this.dataSource.paginator = this.paginator
}
尝试代码:
Tel MC WT
AAA Rubber 9999
BBB Tree 0
CCC Rub 12
AAA Other 20
BBB Same 999
DDD Other-Same 70
错误:
df.loc[(df[WT] == 0 | df[WT] == 999 | df[WT] == 9999 | df[WT] == 99999),'Comment'] = 'Fail'
预期输出:
AttributeError: 'str' object has no attribute 'loc'
答案 0 :(得分:3)
将Series.isin
用于测试成员资格,不匹配的值是NaN
s:
df.loc[df['WT'].isin([0, 999,9999,99999]),'Comment'] = 'Fail'
print (df)
Tel MC WT Comment
0 AAA Rubber 9999 Fail
1 BBB Tree 0 Fail
2 CCC Rub 12 NaN
3 AAA Other 20 NaN
4 BBB Same 999 Fail
5 DDD Other-Same 70 NaN
如果需要分配Fail
,而空值则使用numpy.where
:
df['Comment'] = np.where(df['WT'].isin([0, 999,9999,99999]), 'Fail', '')
print (df)
Tel MC WT Comment
0 AAA Rubber 9999 Fail
1 BBB Tree 0 Fail
2 CCC Rub 12
3 AAA Other 20
4 BBB Same 999 Fail
5 DDD Other-Same 70
答案 1 :(得分:3)
您无需{@ 3}}来链接多个条件:
df.loc[df.WT.isin([0,99,999,9999]), 'Comment'] = 'Fail'
df.Comment.fillna(' ', inplace=True)
Tel MC WT Comment
0 AAA Rubber 9999 Fail
1 BBB Tree 0 Fail
2 CCC Rub 12
3 AAA Other 20
4 BBB Same 999 Fail
5 DDD Other-Same 70
或基于numpy
的一个:
import numpy as np
df['comment'] = np.where(np.in1d(df.WT.values, [0,99,999,9999]), 'Fail', '')
答案 2 :(得分:2)
使用list comprehension
df['Comment'] = ['Fail' if x in [0, 999, 9999, 99999] else '' for x in df['WT']]
Tel MC WT Comment
0 AAA Rubber 9999 Fail
1 BBB Tree 0 Fail
2 CCC Rub 12
3 AAA Other 20
4 BBB Same 999 Fail
5 DDD Other-Same 70
时间
dfbig = pd.concat([df]*1000000, ignore_index=True)
print(dfbig.shape)
(6000000, 3)
list comprehension
%%timeit
dfbig['Comment'] = ['Fail' if x in [0, 999, 9999, 99999] else '' for x in dfbig['WT']]
1.15 s ± 18.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
loc
+ isin
+ fillna
%%timeit
dfbig.loc[dfbig['WT'].isin([0, 999,9999,99999]),'Comment'] = 'Fail'
dfbig.Comment.fillna(' ', inplace=True)
431 ms ± 11.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
np.where
%%timeit
dfbig['Comment'] = np.where(dfbig['WT'].isin([0, 999,9999,99999]), 'Fail', '')
531 ms ± 6.98 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
apply
%%timeit
dfbig['Comment'] = dfbig['WT'].apply(lambda x: 'Fail' if x in [0, 999, 9999, 99999] else ' ')
1.03 s ± 45.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
np.where
+ np.in1d
%%timeit
dfbig['comment'] = np.where(np.in1d(dfbig.WT, [0,99,999,9999]), 'Fail', '')
538 ms ± 6.46 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
答案 3 :(得分:1)
在目标列上使用df.apply
。
df['Comment'] = df['WT'].apply(lambda x: 'Fail' if x in [0, 999, 9999, 99999] else ' ')
输出:
Tel MC WT Comment
0 AAA Rubber 9999 Fail
1 BBB Tree 0 Fail
2 CCC Rub 12
3 AAA Other 20
4 BBB Same 999 Fail
5 DDD Other-Same 70
答案 4 :(得分:-1)
根据您的编码风格,最简单(且最容易理解)的方法是使用numpy.where(df
,它比df.apply()快:
df["Comment"] = np.where((df["WT"] == 0) | (df["WT"] == 999) | (df["WT"] == 9999) | (df["WT"] == 99999), "Fail", "")
np.where()遍历给定数组/数据帧列的条目/行。有关更多信息,请参见documentation of nump.where
希望这会有所帮助。