Question

我有这个数据集：

    1-grams 2-grams             3-grams
0   game    first game         last you part
1   mama    last you           10 10 10
2   cool    naughty cat        us part ii
3   story   mama story         loved first game
4   save    10 10              first last you
... ... ... ...
926260  NaN NaN                game scenery improved
926261  NaN NaN                game scenery really
926262  NaN NaN                game scenes alone
926263  NaN NaN                game scenes cinematic
926264  NaN NaN                했는가 라는 생각이

我想在每个单独的列（特别是2克和3克）上用NaN替换一些没有意义的行，例如。 10 10或us part ii。但是，尽管如此，我必须创建由每个单独的列组成的三个不同的数据集，并替换我不感兴趣的行，然后-最终-再次将它们连接起来。但是，我想知道如何用NaN值替换包含10或us part ii或했는가 라는 생각이的2克或3克字符串中的行。

我想要类似的东西

    1-grams 2-grams             3-grams
0   game    first game         last you part
1   mama    last you           NaN
2   cool    naughty cat        NaN
3   story   mama story         loved first game
4   save    NaN                first last you
... ... ... ...
926260  NaN NaN                game scenery improved
926261  NaN NaN                game scenery really
926262  NaN NaN                game scenes alone
926263  NaN NaN                game scenes cinematic
926264  NaN NaN                NaN

我用NaN用自定义的停用词替换了行。

Answer 1

您可以使用applymap（）：

df = df.applymap(lambda x: np.NaN if '10' in str(x) or 'us part ii' in str(x) else x)

编辑：对于子字符串的广义列表：

df = df.applymap(lambda x: np.NaN if any(substring in x for substring in excluded_list) else x)

Answer 2

使用isin

进行检查

df = df.mask(df.isin(['10 10','us part ii']))

用NaN替换特定的字符串

2 个答案: