在Python中读取JSON文件并在字段中检测特定关键字

时间:2016-02-09 12:05:30

标签: python json parsing

本周刚开始学习Python,我有以下问题。我有一个包含60行的JSON文件(Aberdeen2015.json)(每行包含一篇报纸文章)。此外,每一行都包含一个包含文章datetitlebody的列表(请参见下图,标题无法看到,因为它位于行尾)。

enter image description here

我想执行以下操作:如果文章的body中包含某些关键字,请打印包含这些文章date的列表。到目前为止,我已尝试执行以下操作:

with open("Aberdeen2015.json") as f:
    for i in line():
        if (' tax ' in body[i]
        or ' Tax ' in body[i]
        or ' policy ' in body[i]
        or ' Policy ' in body[i]
        or ' regulation ' in body[i]
        or ' Regulation ' in body[i]
        or ' spending ' in body[i]
        or ' Spending ' in body[i]
        or ' budget ' in body[i]
        or ' Budget ' in body[i]
        or ' central bank ' in body[i]
        or ' Central Bank ' in body[i]
        or ' Central bank ' in body[i]):

        print("date")

我知道代码可能有很多失败,任何帮助都非常受欢迎。

3 个答案:

答案 0 :(得分:2)

这个怎么样:

# import json module for parsing
import json

# define a list of keywords
keywords = ('tax', 'policy', 'regulation', 'spending', 'budget', 'central bank')

with open('test.json') as json_file:

    # read json file line by line
    for line in json_file.readlines():

        # create python dict from json object
        json_dict = json.loads(line)

        # check if "body" (lowercased) contains any of the keywords
        if any(keyword in json_dict["body"].lower() for keyword in keywords):
            print(json_dict["date"])

答案 1 :(得分:1)

执行此操作的有效方法是使用set intersection。

我们使用标准的Python json模块来解析数据,这为我们提供了list dict个,每行一个dict。然后我们得到每行的body字段,将其转换为小写并将其拆分为单个单词。然后我们看看这组单词是否与该组关键字具有非空交集。如果是,我们打印该行的日期。

import json

keywords = ('tax', 'policy', 'regulation', 'spending', 'budget', 'central bank')
keywords = set(keywords)

fname = "Aberdeen2015.json"
with open(fname) as f:
    data = json.load(f)

for row in data:
    s = row['body']
    if keywords.intersection(s.lower().split()):
        print(row['date'])

答案 2 :(得分:1)

我假设您的json文件如下所示:

[
    {"date": "DEC 27, 2015", "body":"the policy has been defined"},
    {"date": "AUG 15 2015", "body":"the tax and policy are done"},
    {"date": "JAN 23 2002", "body": "nothing to get from this one"}
]

此代码有效,如果您不理解,请告诉我

import json, re

words = ["policy", "tax"]

def lookingfor(words):
    with open("file.json", "rb") as f:
        data = json.load(f)
        for line in data:
            for word in words:
                match = re.findall(word, line['body'])
                if match:
                    print(  "word matched: %s ==> date: %s" % (word, line['date']))


lookingfor(words)