Pandas如何用更大的x更新值

时间:2018-01-18 17:15:46

标签: python pandas sklearn-pandas

我有一个pandas列,其中包含大量不到5次的字符串,我不删除这些值但是我想用一个名为“pruned”的占位符字符串替换它们。这样做的最佳方式是什么?

df= pd.DataFrame(['a','a','b','c'],columns=["x"])
# get value counts and set pruned I want something that does as follows
df[df[count<2]] = "pruned"

2 个答案:

答案 0 :(得分:1)

我怀疑有一种更有效的方法可以做到这一点,但简单的方法是建立一个计数字典,然后修剪这些值是否低于计数阈值。考虑示例df

df= pd.DataFrame([12,11,4,15,6,12,4,7],columns=['foo'])

    foo
0   12
1   11
2   4
3   15
4   6
5   12
6   4
7   7

# make a dict with counts
count_dict = {d:(df['foo']==d).sum() for d in df.foo.unique()}
# assign that dict to a column
df['bar'] = [count_dict[d] for d in df.foo]
# loc in the 'pruned' tag
df.loc[df.bar < 2, 'foo']='pruned'

根据需要返回:

    foo bar
0   12      2
1   pruned  1
2   4       2
3   pruned  1
4   pruned  1
5   12      2
6   4       2
7   pruned  1

(当然,如果需要,您可以更改2到5并转储bar列。

更新

根据就地版本的请求,这里是一个单行,可以在不指定其他列或直接创建该dict的情况下执行此操作(并感谢@TrigonaMinima提供values_count()提示):

df= pd.DataFrame([12,11,4,15,6,12,4,7],columns=['foo'])
print(df)
df.foo = df.foo.apply(lambda row: 'pruned' if (df.foo.value_counts() < 2)[row] else row)
print(df)

根据需要再次返回:

   foo
0   12
1   11
2    4
3   15
4    6
5   12
6    4
7    7
      foo
0      12
1  pruned
2       4
3  pruned
4  pruned
5      12
6       4
7  pruned

答案 1 :(得分:0)

根据上面的答案,这是我最终使用的解决方案。

import pandas as pd
df= pd.DataFrame([12,11,4,15,6,12,4,7],columns=['foo'])
# make a dict with counts
count_dict = dict(df.foo.value_counts())
# assign that dict to a column
df['temp_count'] = [count_dict[d] for d in df.foo]
# loc in the 'pruned' tag
df.loc[df.temp_count < 2, 'foo']='pruned'
df = df.drop(["temp_count"], axis=1)