将多级json解析为带条件的字符串

时间:2017-04-14 17:04:51

标签: python json

我有这个嵌套的json项目,我只想将其展平为逗号分隔的字符串(即parkinson:5,billy mays:4)所以如果需要将其存储在数据库中以供将来分析。我写下了下面的函数,但我想知道是否有更优雅的方式使用列表理解(或其他)。我找到了这篇文章,但我不确定如何根据我的需要调整它(Python - parse JSON values by multilevel keys)。

数据如下所示:

{'persons':
     [{'name': 'parkinson', 'sentiment': '5'},
      {'name': 'knott david', 'sentiment': 'none'},
      {'name': 'billy mays', 'sentiment': '4'}],
 'organizations':
      [{'name': 'piper jaffray companies', 'sentiment': 'none'},
       {'name': 'marketbeat.com', 'sentiment': 'none'},
       {'name': 'zacks investment research', 'sentiment': 'none'}]
 'locations': []
}

这是我的代码:

def parse_entities(data):
    results = ''
    for category in data.keys():
    # for c_id, category in enumerate(data.keys()):
        entity_data = data[category]
        for e_id, entity in enumerate(entity_data):
            if not entity_data[e_id]['sentiment'] == 'none':
                results = results + (data[category][e_id]['name'] + ":" +
                                     data[category][e_id]['sentiment'] + ",")

    return results

5 个答案:

答案 0 :(得分:1)

这可能是一种方法。即使使用“适当的库”(取决于您的实际用例)也更有意义。

data = {
 'persons':
     [{'name': 'parkinson', 'sentiment': '5'},
      {'name': 'knott david', 'sentiment': 'none'},
      {'name': 'billy mays', 'sentiment': '4'}],
 'organizations':
      [{'name': 'piper jaffray companies', 'sentiment': 'none'},
       {'name': 'marketbeat.com', 'sentiment': 'none'},
       {'name': 'zacks investment research', 'sentiment': 'none'}],
 'locations': []
}

import itertools

# eq. = itertools.chain.from_iterable(data.values())
dicts = itertools.chain(*data.values())
pairs = [":".join([d['name'], d['sentiment']])
         for d in dicts if d['sentiment'] != 'none']
result = ",".join(pairs)

print(result)

# parkinson:5,billy mays:4

# short, but less readable version
result = ",".join([":".join([d['name'], d['sentiment']])
                   for d in itertools.chain(*data.values())
                   if d['sentiment'] != 'none'])

答案 1 :(得分:1)

首先,让代码更短更好看的最重要的事情是使用自己的变量。请注意updatingentity_data = data[category]。因此,您可以写entity = entity_data[e_id]而不是entity['name']

其次,如果你想要像

这样的东西
data[category][e_id]['name']

通过将其更改为

,您可以缩短阅读时间
for category in data.keys():
    entity_data = data[category]

但是你甚至不需要它,你可以使用for category, entity_data in data.items(): 迭代器来获取值。结合这些改进时,您的代码如下所示:

data.values()

(我还将def parse_entities(data): results = '' for entity_data in data.values(): for entity in entity_data: if entity['sentiment'] != 'none': results += entity['name'] + ":" + entity['sentiment'] + "," return results 更改为results = results + ...,将results += ...更改为if not entity['sentiment'] == 'none',因为它更短且不会降低可读性。

当你拥有它时,通过使用列表理解更容易使它变得更短更优雅:

if entity['sentiment'] != 'none'

答案 2 :(得分:1)

也许这样的事情会起作用吗?

# filter to get `nibt` columns and find the first column that equals max
nibt_maxes = df.filter(regex='nibt_\d+').eq(df.max_nibt, 0).idxmax(1)

# swap out the string `nibt` with `line`
lines = nibt_maxes.replace('nibt', 'line', regex=True)

# use `lookup` and assign values
df['model'] = df.lookup(lines.index, lines.values)

   cust_id  max_nibt  nibt_0  nibt_1  nibt_10  line_0  line_1  line_10  model
0       11       200      -5     200      500     100     200      300    200
1       22       300     -10     100      300     100     200      300    300
2       33       400     -20       0      400     100     200      300    300

输出看起来像这样

def parse_entities(data):
    results = []
    for category in data.keys():
        results += list(map(lambda x: '{0}:{1}'.format(x['name'], x['sentiment']),
                            filter(lambda i: i['sentiment'] != 'none', data[category])))
    return ','.join(results)

if __name__ == '__main__':
    print(parse_entities(data))

答案 3 :(得分:0)

这是我们需要执行3个单独任务的问题:

  1. 过滤掉不合格的数据行
  2. 将列表的字典展平为一个简单的列表
  3. 将每个字典对象转换为一个简单的元组,准备格式化
  4. 以下是代码:

    def parse_entities(data):
        new_data = [
            (row['name'], row['sentiment'])        # 3. Transform
            for rows in data.values()              # 2. Flatten
                for row in rows                    # 2. Flatten
                    if row['sentiment'] != 'none'  # 1. Filter
        ]
    
        # e.g, new_data = [('parkinson', '5'), ('billy mays', '4')]
    
        return ','.join('{}:{}'.format(*row) for row in new_data)
    
    #
    # test code
    #
    data = {
        'locations': [],
        'organizations': [
            {'name': 'piper jaffray companies', 'sentiment': 'none'},
            {'name': 'marketbeat.com', 'sentiment': 'none'},
            {'name': 'zacks investment research', 'sentiment': 'none'}
        ],
        'persons': [
            {'name': 'parkinson', 'sentiment': '5'},
            {'name': 'knott david', 'sentiment': 'none'},
            {'name': 'billy mays', 'sentiment': '4'}
        ],
    }
    print parse_entities(data)
    

    输出:

    parkinson:5,billy mays:4
    

答案 4 :(得分:0)

这是一个generator expression,它可以做到:

while(reader.hasNextLine()){
    String currentLine = reader.readLine();
}

注意:我稍微更改了示例数据,以确保它处理多个data = {'persons': [ {'name': 'parkinson', 'sentiment': '5'}, {'name': 'knott david', 'sentiment': 'none'}, {'name': 'billy mays', 'sentiment': '4'}], 'organizations': [ {'name': 'piper jaffray companies', 'sentiment': 'none'}, {'name': 'marketbeat.com', 'sentiment': '99'}, {'name': 'zacks investment research', 'sentiment': 'none'}], 'locations': [] } results = ','.join(entity['name'] + ':' + entity['sentiment'] for category, entity_data in data.items() for entity in entity_data if entity['sentiment'] is not 'none') print(results) # -> parkinson:5,billy mays:4,marketbeat.com:99 中与您的代码相同的数据。