在数据框中查找包含双字/三字组单词的行

时间:2019-11-11 00:25:47

标签: python pandas dataframe nlp

此示例用于查找二元组:

给出:

import pandas as pd
data = [['tom', 10], ['jobs', 15], ['phone', 14],['pop', 16], ['they_said', 11], ['this_example', 22],['lights', 14]] 

test = pd.DataFrame(data, columns = ['Words', 'Freqeuncy']) 

test

我想编写一个查询以仅查找以“ _”分隔的单词,以便返回的df如下所示:

data2 = [['they_said', 11], ['this_example', 22]]

test2 = pd.DataFrame(data2, columns = ['Words', 'Freqeuncy']) 

test2

我想知道为什么这样的事情不起作用.. data [data ['Words'] ==(len> 3)]

2 个答案:

答案 0 :(得分:0)

要使用功能,您需要使用apply:

df[df.apply(lambda x: len(x['Words']), axis=1)> 3]

答案 1 :(得分:0)

熊猫的工作方式是这样的:

import pandas as pd
data = [['tom', 10], ['jobs', 15], ['phone', 14],['pop', 16], ['they_said', 11], ['this_example', 22],['lights', 14]] 

test = pd.DataFrame(data, columns = ['Words', 'Freqeuncy']) 

test = test[test.Words.str.contains('_')] 

test

相反,您可以执行以下操作:

test = test[~test.Words.str.contains('_')]