Question

我有一个如下所示的数据框，我需要生成一个名为“ Comment”的新列，对于指定的值，它应该显示为“ Fail”

输入：

     initPaginator(){
        this.dataSource.paginator = this.paginator
}

尝试代码：

        Tel    MC             WT

        AAA    Rubber         9999
        BBB    Tree           0
        CCC    Rub            12
        AAA    Other          20
        BBB    Same           999
        DDD    Other-Same     70

错误：

          df.loc[(df[WT] == 0 | df[WT] == 999 | df[WT] == 9999 | df[WT] == 99999),'Comment'] = 'Fail'

预期输出：

         AttributeError: 'str' object has no attribute 'loc'

Answer 1

将Series.isin用于测试成员资格，不匹配的值是NaN s：

df.loc[df['WT'].isin([0, 999,9999,99999]),'Comment'] = 'Fail'
print (df)
   Tel          MC    WT Comment
0  AAA      Rubber  9999    Fail
1  BBB        Tree     0    Fail
2  CCC         Rub    12     NaN
3  AAA       Other    20     NaN
4  BBB        Same   999    Fail
5  DDD  Other-Same    70     NaN

如果需要分配Fail，而空值则使用numpy.where：

df['Comment'] = np.where(df['WT'].isin([0, 999,9999,99999]), 'Fail', '')
print (df)
   Tel          MC    WT Comment
0  AAA      Rubber  9999    Fail
1  BBB        Tree     0    Fail
2  CCC         Rub    12        
3  AAA       Other    20        
4  BBB        Same   999    Fail
5  DDD  Other-Same    70

Answer 2

您无需{@ 3}}来链接多个条件：

df.loc[df.WT.isin([0,99,999,9999]), 'Comment'] = 'Fail'
df.Comment.fillna(' ', inplace=True)


  Tel          MC    WT Comment
0  AAA      Rubber  9999    Fail
1  BBB        Tree     0    Fail
2  CCC         Rub    12        
3  AAA       Other    20        
4  BBB        Same   999    Fail
5  DDD  Other-Same    70

或基于numpy的一个：

import numpy as np

df['comment'] = np.where(np.in1d(df.WT.values, [0,99,999,9999]), 'Fail', '')

Answer 3

使用list comprehension

df['Comment'] = ['Fail' if x in [0, 999, 9999, 99999] else '' for x in df['WT']]

   Tel          MC    WT Comment
0  AAA      Rubber  9999    Fail
1  BBB        Tree     0    Fail
2  CCC         Rub    12        
3  AAA       Other    20        
4  BBB        Same   999    Fail
5  DDD  Other-Same    70

时间

dfbig = pd.concat([df]*1000000, ignore_index=True)

print(dfbig.shape)
(6000000, 3)

list comprehension

%%timeit 
dfbig['Comment'] = ['Fail' if x in [0, 999, 9999, 99999] else '' for x in dfbig['WT']]

1.15 s ± 18.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

loc + isin + fillna

%%timeit
dfbig.loc[dfbig['WT'].isin([0, 999,9999,99999]),'Comment'] = 'Fail'
dfbig.Comment.fillna(' ', inplace=True)

431 ms ± 11.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

np.where

%%timeit
dfbig['Comment'] = np.where(dfbig['WT'].isin([0, 999,9999,99999]), 'Fail', '')

531 ms ± 6.98 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

apply

%%timeit
dfbig['Comment'] = dfbig['WT'].apply(lambda x: 'Fail' if x in [0, 999, 9999, 99999] else ' ')

1.03 s ± 45.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

np.where + np.in1d

%%timeit
dfbig['comment'] = np.where(np.in1d(dfbig.WT, [0,99,999,9999]), 'Fail', '')

538 ms ± 6.46 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Answer 4

在目标列上使用df.apply。

df['Comment'] = df['WT'].apply(lambda x: 'Fail' if x in [0, 999, 9999, 99999] else ' ')

输出：

  Tel          MC    WT Comment
0  AAA      Rubber  9999    Fail
1  BBB        Tree     0    Fail
2  CCC         Rub    12        
3  AAA       Other    20        
4  BBB        Same   999    Fail
5  DDD  Other-Same    70

Answer 5

根据您的编码风格，最简单（且最容易理解）的方法是使用numpy.where(df，它比df.apply（）快：

df["Comment"] = np.where((df["WT"] == 0) | (df["WT"] == 999) | (df["WT"] == 9999) | (df["WT"] == 99999), "Fail", "")

np.where（）遍历给定数组/数据帧列的条目/行。有关更多信息，请参见documentation of nump.where

希望这会有所帮助。

如何基于熊猫另一列中的条件生成具有值的新列

5 个答案: