Question

我有一个用于情感分析的数据框，包括列：PhraseID，Phrase，Rating。

我想过滤数据帧，例如只保留短语由单个单词组成的行。词组Phrase当然包含字符串。

Answer 1

我认为这或多或少是干净的（但是这里的大熊猫主人可能会想出一个班轮）

import pandas as pd
df = pd.DataFrame({"PhraseID" : [1, 3, 4], "Phrase": ["hey what", "up", "no"]})

def f(x):
    return len(x.split())
df["n_words"] = df.Phrase.apply(f)
df[df.n_words == 1]

给出

    Phrase  PhraseID n_words
1   up       3         1
2   no       4         1

如果您愿意，也可以执行匿名功能：

df["n_words"] = df.Phrase.apply(lambda x : len(x.split()) )

Answer 2

我会尝试这个。

mask = df['Phrase'].str.match(r'\A[\w-]+\Z')
df[mask]

或者一行中的所有内容都是

df[df['Phrase'].str.match(r'\A[\w-]+\Z')]

Answer 3

一个衬垫，返回一个只包含一个单词短语记录的数据框。

import pandas as pd
df[df.Phrase.apply(lambda x: len(x.split())== 1)]

这可以假设您的短语可以使用split()

进行标记

在Pandas DataFrame中只保留一个单词的句子

3 个答案: