本周刚开始学习Python,我有以下问题。我有一个包含60行的JSON文件(Aberdeen2015.json)(每行包含一篇报纸文章)。此外,每一行都包含一个包含文章date
,title
和body
的列表(请参见下图,标题无法看到,因为它位于行尾)。
我想执行以下操作:如果文章的body
中包含某些关键字,请打印包含这些文章date
的列表。到目前为止,我已尝试执行以下操作:
with open("Aberdeen2015.json") as f:
for i in line():
if (' tax ' in body[i]
or ' Tax ' in body[i]
or ' policy ' in body[i]
or ' Policy ' in body[i]
or ' regulation ' in body[i]
or ' Regulation ' in body[i]
or ' spending ' in body[i]
or ' Spending ' in body[i]
or ' budget ' in body[i]
or ' Budget ' in body[i]
or ' central bank ' in body[i]
or ' Central Bank ' in body[i]
or ' Central bank ' in body[i]):
print("date")
我知道代码可能有很多失败,任何帮助都非常受欢迎。
答案 0 :(得分:2)
这个怎么样:
# import json module for parsing
import json
# define a list of keywords
keywords = ('tax', 'policy', 'regulation', 'spending', 'budget', 'central bank')
with open('test.json') as json_file:
# read json file line by line
for line in json_file.readlines():
# create python dict from json object
json_dict = json.loads(line)
# check if "body" (lowercased) contains any of the keywords
if any(keyword in json_dict["body"].lower() for keyword in keywords):
print(json_dict["date"])
答案 1 :(得分:1)
执行此操作的有效方法是使用set intersection。
我们使用标准的Python json
模块来解析数据,这为我们提供了list
dict
个,每行一个dict
。然后我们得到每行的body
字段,将其转换为小写并将其拆分为单个单词。然后我们看看这组单词是否与该组关键字具有非空交集。如果是,我们打印该行的日期。
import json
keywords = ('tax', 'policy', 'regulation', 'spending', 'budget', 'central bank')
keywords = set(keywords)
fname = "Aberdeen2015.json"
with open(fname) as f:
data = json.load(f)
for row in data:
s = row['body']
if keywords.intersection(s.lower().split()):
print(row['date'])
答案 2 :(得分:1)
我假设您的json文件如下所示:
[
{"date": "DEC 27, 2015", "body":"the policy has been defined"},
{"date": "AUG 15 2015", "body":"the tax and policy are done"},
{"date": "JAN 23 2002", "body": "nothing to get from this one"}
]
此代码有效,如果您不理解,请告诉我
import json, re
words = ["policy", "tax"]
def lookingfor(words):
with open("file.json", "rb") as f:
data = json.load(f)
for line in data:
for word in words:
match = re.findall(word, line['body'])
if match:
print( "word matched: %s ==> date: %s" % (word, line['date']))
lookingfor(words)