从pandas中排除列where()

时间:2016-05-19 08:26:59

标签: python python-2.7 pandas

我有以下pandas df:

import pandas as pd
import numpy as np    

pd_df = pd.DataFrame({'Qu1': ['apple', 'potato', 'cheese', 'banana', 'cheese', 'banana', 'cheese', 'potato', 'egg'],
              'Qu2': ['sausage', 'banana', 'apple', 'apple', 'apple', np.nan, 'banana', 'banana', 'banana'],
              'Qu3': ['apple', 'potato', 'sausage', 'cheese', 'cheese', 'potato', 'cheese', 'potato', 'egg']})

我想仅在where()Qu1两列上实施Qu2并保留其余列 original stackoverflow question ,所以我创建了pd1

pd1 = pd_df.where(pd_df.apply(lambda x: x.map(x.value_counts()))>=2,
                              "other")[['Qu1', 'Qu2']]

然后我将剩余的pd_dfpd_df['Qu3']添加到pd1

pd1['Qu3'] = pd_df['Qu3']
pd_df = []

我的问题是:最初我想在where()部分执行df并按原样保留其余列,因此上述代码对于大型数据集可能会有危险?我可以这样破坏原始数据吗?如果是,最好的方法是什么?

非常感谢!

1 个答案:

答案 0 :(得分:1)

您可以明确地获取原始df的copy,然后覆盖该df的选择:

In [40]:
pd1 = pd_df.copy()
pd1[['Qu1', 'Qu2']] = pd1[['Qu1', 'Qu2']].where(pd_df.apply(lambda x: x.map(x.value_counts()))>=2,
                              "other")
pd1

Out[40]:
      Qu1     Qu2      Qu3
0   other   other    apple
1  potato  banana   potato
2  cheese   apple  sausage
3  banana   apple   cheese
4  cheese   apple   cheese
5  banana   other   potato
6  cheese  banana   cheese
7  potato  banana   potato
8   other  banana      egg

所以这里的不同之处在于我们只对df的一部分进行操作,而不是整个df,然后选择感兴趣的cols

<强>更新

如果你想覆盖那些cols,那么只需选择那些:

In [48]:
pd_df[['Qu1', 'Qu2']] = pd_df[['Qu1', 'Qu2']].where(pd_df.apply(lambda x: x.map(x.value_counts()))>=2,
                              "other")
pd_df

Out[48]:
      Qu1     Qu2      Qu3
0   other   other    apple
1  potato  banana   potato
2  cheese   apple  sausage
3  banana   apple   cheese
4  cheese   apple   cheese
5  banana   other   potato
6  cheese  banana   cheese
7  potato  banana   potato
8   other  banana      egg