我正在对csv文件进行作业分配,并且要求我做一个过滤器来检查关键字。以前,我已经创建了词典列表,现在要求我检查每个词典中的关键字。如果找到了关键字,则要求我将该词典附加到另一个称为过滤列表的列表。
对于背景情况,作业是在唐纳德·特朗普(Donald Trump)的Facebook帖子上发布的,下面是数据示例
[{'link_name': 'Timeline Photos',
'num_angrys': '7',
'num_comments': '543',
'num_hahas': '17',
'num_likes': '6178',
'num_loves': '572',
'num_reactions': '6813',
'num_sads': '0',
'num_shares': '359',
'num_wows': '39',
'status_id': '153080620724_10157915294545725',
'status_link': 'https://www.facebook.com/DonaldTrump/photos/a.488852220724.393301.153080620724/10157915294545725/?type=3',
'status_message': 'Beautiful evening in Wisconsin- THANK YOU for your incredible support tonight! Everyone get out on November 8th - and VOTE! LETS MAKE AMERICA GREAT AGAIN! -DJT',
'status_published': '10/17/2016 20:56:51',
'status_type': 'photo'},
{'link_name': '',
'num_angrys': '5211',
'num_comments': '3644',
'num_hahas': '75',
'num_likes': '26649',
'num_loves': '487',
'num_reactions': '33768',
'num_sads': '191',
'num_shares': '17653',
'num_wows': '1155',
'status_id': '153080620724_10157914483265725',
'status_link': 'https://www.facebook.com/DonaldTrump/videos/10157914483265725/',
'status_message': "The State Department's quid pro quo scheme proves how CORRUPT our system is. Attempting to protect Crooked Hillary, NOT our American service members or national security information, is absolutely DISGRACEFUL. The American people deserve so much better. On November 8th, we will END this RIGGED system once and for all!",
'status_published': '10/17/2016 18:00:41',
'status_type': 'video'}]
当前这是我拥有的代码
from nltk.tokenize import sent_tokenize, word_tokenize
def get_update_with_keywords(status_updates, keywords, case_sensitive = "false"):
# your code here
with open(input_file, 'r') as infile:
filtered_status_updates = []
for row in status_updates:
tokens = word_tokenize(row["status_message"])
if tokens == keywords:
filtered_status_updates.append(row)
return filtered_status_updates
keywords = ["clinton", "obama"]
get_update_with_keywords(status_updates, keywords)
但我一直得到以下输出:
[]
我认为这是因为我试图将整个词典添加到列表中?!
答案 0 :(得分:0)
使用它代替检查它是否包含在列表中。 所以你的
如果令牌==关键字:
将更改为
子列表(关键字,令牌)
def sublist(ls1, ls2):
'''
>>> sublist([], [1,2,3])
True
>>> sublist([1,2,3,4], [2,5,3])
True
>>> sublist([1,2,3,4], [0,3,2])
False
>>> sublist([1,2,3,4], [1,2,5,6,7,8,5,76,4,3])
False
'''
def get_all_in(one, another):
for element in one:
if element in another:
yield element
for x1, x2 in zip(get_all_in(ls1, ls2), get_all_in(ls2, ls1)):
if x1 != x2:
return False
return True