Question

我想知道如何从数据集中删除一些变量，特别是数字和字符串列表。例如。

    Test      Num
0   bam       132
1   -         65
2   creation  47
3   MAN       32
4   41        831
... ... ...
460 Luchino   21
461 42 4126   7
462 finger    43
463 washing   1

我想要类似的东西

    Test      Num
0   bam       132
2   creation  47
... ... ...
460 Luchino   21
462 finger    43
463 washing   1

我手动删除了MAN（应该将其包含在字符串列表中，如停用词），-和数字。

我尝试过使用isdigit，但是它无法正常工作，因此我确定代码中有错误：

df['Text'].where(~df['Text'].str.isdigit())

还有我的停用词：

my_stop=['MAN','-']
df['Text'].apply(lambda lst: [x for x in lst if x in my_stop])

Answer 1

如果要过滤，可以使用.loc

df = df.loc[~df.Text.str.isdigit() & ~df.Text.isin(['MAN']), :]

.where(cond, other)返回与self形状相同的数据框或序列，但保留cond为true的原始值，并替换为other为false的原始值。在the docs

中了解更多信息

Answer 2

嗨，您应该尝试以下代码：

 df[df['Text']!='MAN']

从熊猫数据框中删除数字和用户停用词

2 个答案: