Question

我有一个数据框，其中包含两列，即id和text。

例如，我想检索文本长度大于2的行。

文本长度是文本中的单词数，而不是字符数。

我做了以下事情：

df = pd.DataFrame([{'id': 1, 'text': 'Connected to hgfxg debugger'},
                   {'id': 2, 'text': 'fdss debugger - process 6384 is connecting'},
                   {'id': 3, 'text': 'we are'},
                   ])
df = df[df['text'].str.len() > 2]
print(df) #<-- it will print all the sentences above

但是这会检索具有2个以上字符的句子（在我们的示例中，是上面所有的句子）。

如何在一个代码行中实现我想要的？可能吗？

我可以用不止一个来做到这一点，例如：

df['text_len'] = df['text'].map(lambda x: len(str(x).split()))
df = df[df['text_len'] > 2]
print(df) #<-- will print the first two sentences

Answer 1

想想另一种方法，您想要两个以上的句子，因此您需要在字符串中包含两个' '，这里我们只计算' '大于2

df[df['text'].str.count(' ')>2]
Out[230]: 
   id                                        text
0   1                 Connected to hgfxg debugger
1   2  fdss debugger - process 6384 is connecting

Answer 2

您还可以使用：

df[df.text.str.split('\s+').str.len().gt(2)]

在数据框内搜索并拆分文本

2 个答案: