Question

我有这个嵌套的json项目，我只想将其展平为逗号分隔的字符串（即parkinson：5，billy mays：4）所以如果需要将其存储在数据库中以供将来分析。我写下了下面的函数，但我想知道是否有更优雅的方式使用列表理解（或其他）。我找到了这篇文章，但我不确定如何根据我的需要调整它（Python - parse JSON values by multilevel keys）。

数据如下所示：

{'persons':
     [{'name': 'parkinson', 'sentiment': '5'},
      {'name': 'knott david', 'sentiment': 'none'},
      {'name': 'billy mays', 'sentiment': '4'}],
 'organizations':
      [{'name': 'piper jaffray companies', 'sentiment': 'none'},
       {'name': 'marketbeat.com', 'sentiment': 'none'},
       {'name': 'zacks investment research', 'sentiment': 'none'}]
 'locations': []
}

这是我的代码：

def parse_entities(data):
    results = ''
    for category in data.keys():
    # for c_id, category in enumerate(data.keys()):
        entity_data = data[category]
        for e_id, entity in enumerate(entity_data):
            if not entity_data[e_id]['sentiment'] == 'none':
                results = results + (data[category][e_id]['name'] + ":" +
                                     data[category][e_id]['sentiment'] + ",")

    return results

Answer 1

这可能是一种方法。即使使用“适当的库”（取决于您的实际用例）也更有意义。

data = {
 'persons':
     [{'name': 'parkinson', 'sentiment': '5'},
      {'name': 'knott david', 'sentiment': 'none'},
      {'name': 'billy mays', 'sentiment': '4'}],
 'organizations':
      [{'name': 'piper jaffray companies', 'sentiment': 'none'},
       {'name': 'marketbeat.com', 'sentiment': 'none'},
       {'name': 'zacks investment research', 'sentiment': 'none'}],
 'locations': []
}

import itertools

# eq. = itertools.chain.from_iterable(data.values())
dicts = itertools.chain(*data.values())
pairs = [":".join([d['name'], d['sentiment']])
         for d in dicts if d['sentiment'] != 'none']
result = ",".join(pairs)

print(result)

# parkinson:5,billy mays:4

# short, but less readable version
result = ",".join([":".join([d['name'], d['sentiment']])
                   for d in itertools.chain(*data.values())
                   if d['sentiment'] != 'none'])

Answer 2

首先，让代码更短更好看的最重要的事情是使用自己的变量。请注意updating和entity_data = data[category]。因此，您可以写entity = entity_data[e_id]而不是entity['name']。

其次，如果你想要像

这样的东西

data[category][e_id]['name']

通过将其更改为

，您可以缩短阅读时间

for category in data.keys():
    entity_data = data[category]

但是你甚至不需要它，你可以使用for category, entity_data in data.items():迭代器来获取值。结合这些改进时，您的代码如下所示：

data.values()

（我还将def parse_entities(data): results = '' for entity_data in data.values(): for entity in entity_data: if entity['sentiment'] != 'none': results += entity['name'] + ":" + entity['sentiment'] + "," return results更改为results = results + ...，将results += ...更改为if not entity['sentiment'] == 'none'，因为它更短且不会降低可读性。

当你拥有它时，通过使用列表理解更容易使它变得更短更优雅：

if entity['sentiment'] != 'none'

Answer 3

也许这样的事情会起作用吗？

# filter to get `nibt` columns and find the first column that equals max
nibt_maxes = df.filter(regex='nibt_\d+').eq(df.max_nibt, 0).idxmax(1)

# swap out the string `nibt` with `line`
lines = nibt_maxes.replace('nibt', 'line', regex=True)

# use `lookup` and assign values
df['model'] = df.lookup(lines.index, lines.values)

   cust_id  max_nibt  nibt_0  nibt_1  nibt_10  line_0  line_1  line_10  model
0       11       200      -5     200      500     100     200      300    200
1       22       300     -10     100      300     100     200      300    300
2       33       400     -20       0      400     100     200      300    300

输出看起来像这样

def parse_entities(data):
    results = []
    for category in data.keys():
        results += list(map(lambda x: '{0}:{1}'.format(x['name'], x['sentiment']),
                            filter(lambda i: i['sentiment'] != 'none', data[category])))
    return ','.join(results)

if __name__ == '__main__':
    print(parse_entities(data))

Answer 4

这是我们需要执行3个单独任务的问题：

过滤掉不合格的数据行
将列表的字典展平为一个简单的列表
将每个字典对象转换为一个简单的元组，准备格式化

以下是代码：

def parse_entities(data):
    new_data = [
        (row['name'], row['sentiment'])        # 3. Transform
        for rows in data.values()              # 2. Flatten
            for row in rows                    # 2. Flatten
                if row['sentiment'] != 'none'  # 1. Filter
    ]

    # e.g, new_data = [('parkinson', '5'), ('billy mays', '4')]

    return ','.join('{}:{}'.format(*row) for row in new_data)

#
# test code
#
data = {
    'locations': [],
    'organizations': [
        {'name': 'piper jaffray companies', 'sentiment': 'none'},
        {'name': 'marketbeat.com', 'sentiment': 'none'},
        {'name': 'zacks investment research', 'sentiment': 'none'}
    ],
    'persons': [
        {'name': 'parkinson', 'sentiment': '5'},
        {'name': 'knott david', 'sentiment': 'none'},
        {'name': 'billy mays', 'sentiment': '4'}
    ],
}
print parse_entities(data)

输出：

parkinson:5,billy mays:4

Answer 5

这是一个generator expression，它可以做到：

while(reader.hasNextLine()){
    String currentLine = reader.readLine();
}

注意：我稍微更改了示例数据，以确保它处理多个data = {'persons': [ {'name': 'parkinson', 'sentiment': '5'}, {'name': 'knott david', 'sentiment': 'none'}, {'name': 'billy mays', 'sentiment': '4'}], 'organizations': [ {'name': 'piper jaffray companies', 'sentiment': 'none'}, {'name': 'marketbeat.com', 'sentiment': '99'}, {'name': 'zacks investment research', 'sentiment': 'none'}], 'locations': [] } results = ','.join(entity['name'] + ':' + entity['sentiment'] for category, entity_data in data.items() for entity in entity_data if entity['sentiment'] is not 'none') print(results) # -> parkinson:5,billy mays:4,marketbeat.com:99中与您的代码相同的数据。

将多级json解析为带条件的字符串

5 个答案: