如何基于熊猫另一列中的条件生成具有值的新列

时间:2019-09-16 11:31:36

标签: python python-3.x pandas dataframe

我有一个如下所示的数据框,我需要生成一个名为“ Comment”的新列,对于指定的值,它应该显示为“ Fail”

输入:

     initPaginator(){
        this.dataSource.paginator = this.paginator
}

尝试代码:

        Tel    MC             WT

        AAA    Rubber         9999
        BBB    Tree           0
        CCC    Rub            12
        AAA    Other          20
        BBB    Same           999
        DDD    Other-Same     70 

错误:

          df.loc[(df[WT] == 0 | df[WT] == 999 | df[WT] == 9999 | df[WT] == 99999),'Comment'] = 'Fail'

预期输出:

         AttributeError: 'str' object has no attribute 'loc'

5 个答案:

答案 0 :(得分:3)

Series.isin用于测试成员资格,不匹配的值是NaN s:

df.loc[df['WT'].isin([0, 999,9999,99999]),'Comment'] = 'Fail'
print (df)
   Tel          MC    WT Comment
0  AAA      Rubber  9999    Fail
1  BBB        Tree     0    Fail
2  CCC         Rub    12     NaN
3  AAA       Other    20     NaN
4  BBB        Same   999    Fail
5  DDD  Other-Same    70     NaN

如果需要分配Fail,而空值则使用numpy.where

df['Comment'] = np.where(df['WT'].isin([0, 999,9999,99999]), 'Fail', '')
print (df)
   Tel          MC    WT Comment
0  AAA      Rubber  9999    Fail
1  BBB        Tree     0    Fail
2  CCC         Rub    12        
3  AAA       Other    20        
4  BBB        Same   999    Fail
5  DDD  Other-Same    70        

答案 1 :(得分:3)

您无需{@ 3}}来链接多个条件:

df.loc[df.WT.isin([0,99,999,9999]), 'Comment'] = 'Fail'
df.Comment.fillna(' ', inplace=True)


  Tel          MC    WT Comment
0  AAA      Rubber  9999    Fail
1  BBB        Tree     0    Fail
2  CCC         Rub    12        
3  AAA       Other    20        
4  BBB        Same   999    Fail
5  DDD  Other-Same    70        

或基于numpy的一个:

import numpy as np

df['comment'] = np.where(np.in1d(df.WT.values, [0,99,999,9999]), 'Fail', '')

答案 2 :(得分:2)

使用list comprehension

df['Comment'] = ['Fail' if x in [0, 999, 9999, 99999] else '' for x in df['WT']]

   Tel          MC    WT Comment
0  AAA      Rubber  9999    Fail
1  BBB        Tree     0    Fail
2  CCC         Rub    12        
3  AAA       Other    20        
4  BBB        Same   999    Fail
5  DDD  Other-Same    70        

时间

dfbig = pd.concat([df]*1000000, ignore_index=True)

print(dfbig.shape)
(6000000, 3)
  1. list comprehension
%%timeit 
dfbig['Comment'] = ['Fail' if x in [0, 999, 9999, 99999] else '' for x in dfbig['WT']]

1.15 s ± 18.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
  1. loc + isin + fillna
%%timeit
dfbig.loc[dfbig['WT'].isin([0, 999,9999,99999]),'Comment'] = 'Fail'
dfbig.Comment.fillna(' ', inplace=True)

431 ms ± 11.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
  1. np.where
%%timeit
dfbig['Comment'] = np.where(dfbig['WT'].isin([0, 999,9999,99999]), 'Fail', '')

531 ms ± 6.98 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
  1. apply
%%timeit
dfbig['Comment'] = dfbig['WT'].apply(lambda x: 'Fail' if x in [0, 999, 9999, 99999] else ' ')

1.03 s ± 45.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
  1. np.where + np.in1d
%%timeit
dfbig['comment'] = np.where(np.in1d(dfbig.WT, [0,99,999,9999]), 'Fail', '')

538 ms ± 6.46 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

答案 3 :(得分:1)

在目标列上使用df.apply

df['Comment'] = df['WT'].apply(lambda x: 'Fail' if x in [0, 999, 9999, 99999] else ' ')

输出:

  Tel          MC    WT Comment
0  AAA      Rubber  9999    Fail
1  BBB        Tree     0    Fail
2  CCC         Rub    12        
3  AAA       Other    20        
4  BBB        Same   999    Fail
5  DDD  Other-Same    70        

答案 4 :(得分:-1)

根据您的编码风格,最简单(且最容易理解)的方法是使用numpy.where(df,它比df.apply()快:

df["Comment"] = np.where((df["WT"] == 0) | (df["WT"] == 999) | (df["WT"] == 9999) | (df["WT"] == 99999), "Fail", "")

np.where()遍历给定数组/数据帧列的条目/行。有关更多信息,请参见documentation of nump.where

希望这会有所帮助。