我有这个数据集:
1-grams 2-grams 3-grams
0 game first game last you part
1 mama last you 10 10 10
2 cool naughty cat us part ii
3 story mama story loved first game
4 save 10 10 first last you
... ... ... ...
926260 NaN NaN game scenery improved
926261 NaN NaN game scenery really
926262 NaN NaN game scenes alone
926263 NaN NaN game scenes cinematic
926264 NaN NaN 했는가 라는 생각이
我想在每个单独的列(特别是2克和3克)上用NaN替换一些没有意义的行,例如。 10 10
或us part ii
。
但是,尽管如此,我必须创建由每个单独的列组成的三个不同的数据集,并替换我不感兴趣的行,然后-最终-再次将它们连接起来。
但是,我想知道如何用NaN值替换包含10
或us part ii
或했는가 라는 생각이
的2克或3克字符串中的行。
我想要类似的东西
1-grams 2-grams 3-grams
0 game first game last you part
1 mama last you NaN
2 cool naughty cat NaN
3 story mama story loved first game
4 save NaN first last you
... ... ... ...
926260 NaN NaN game scenery improved
926261 NaN NaN game scenery really
926262 NaN NaN game scenes alone
926263 NaN NaN game scenes cinematic
926264 NaN NaN NaN
我用NaN用自定义的停用词替换了行。
答案 0 :(得分:1)
您可以使用applymap():
df = df.applymap(lambda x: np.NaN if '10' in str(x) or 'us part ii' in str(x) else x)
编辑: 对于子字符串的广义列表:
df = df.applymap(lambda x: np.NaN if any(substring in x for substring in excluded_list) else x)
答案 1 :(得分:0)
使用isin
df = df.mask(df.isin(['10 10','us part ii']))