Question

我想用Python中更少的行检查数据框中的奇怪分类项

我尝试使用以下代码显示奇怪的物品。

for i in range(data.shape[1]):
  if data[data.columns[i]].dtype == "object":
    print(data[data.columns[i]].value_counts())

有没有一种方法使用更少的行来检查分类数据？

Answer 1

如果要打印列的所有唯一条目，我建议使用unique（docs）方法

>>> a = pd.DataFrame({'sex':['m','f','m','m','m', 'f', 'booooy']})
>>> a.loc[:,'sex'].unique()
Out[1]: array(['m', 'f', 'booooy'], dtype=object)

要将booooy项更改为m，可以使用re.sub（docs）方法

>>> a.loc[:,'sex'].apply(lambda x: re.sub(r'booooy','m', x))
Out[2]: 
0    m
1    f
2    m
3    m
4    m
5    f
6    m
Name: sex, dtype: object

如果您有很多re.sub调用-可以不必将其应用，而可以将它们放入函数中

>>> def filter_text(x):
...    x = re.sub(r'booooy','m',x)
...    x = re.sub(r'girl','f',x)
...    # . . . . . .
...    return x
>>> a.loc[:,'sex'].apply(filter_text)
Out[3]: 
0    m
1    f
2    m
3    m
4    m
5    f
6    m
Name: sex, dtype: object

希望有帮助！

检查数据集中的奇怪项目

1 个答案: