我有这个嵌套的json项目,我只想将其展平为逗号分隔的字符串(即parkinson:5,billy mays:4)所以如果需要将其存储在数据库中以供将来分析。我写下了下面的函数,但我想知道是否有更优雅的方式使用列表理解(或其他)。我找到了这篇文章,但我不确定如何根据我的需要调整它(Python - parse JSON values by multilevel keys)。
数据如下所示:
{'persons':
[{'name': 'parkinson', 'sentiment': '5'},
{'name': 'knott david', 'sentiment': 'none'},
{'name': 'billy mays', 'sentiment': '4'}],
'organizations':
[{'name': 'piper jaffray companies', 'sentiment': 'none'},
{'name': 'marketbeat.com', 'sentiment': 'none'},
{'name': 'zacks investment research', 'sentiment': 'none'}]
'locations': []
}
这是我的代码:
def parse_entities(data):
results = ''
for category in data.keys():
# for c_id, category in enumerate(data.keys()):
entity_data = data[category]
for e_id, entity in enumerate(entity_data):
if not entity_data[e_id]['sentiment'] == 'none':
results = results + (data[category][e_id]['name'] + ":" +
data[category][e_id]['sentiment'] + ",")
return results
答案 0 :(得分:1)
这可能是一种方法。即使使用“适当的库”(取决于您的实际用例)也更有意义。
data = {
'persons':
[{'name': 'parkinson', 'sentiment': '5'},
{'name': 'knott david', 'sentiment': 'none'},
{'name': 'billy mays', 'sentiment': '4'}],
'organizations':
[{'name': 'piper jaffray companies', 'sentiment': 'none'},
{'name': 'marketbeat.com', 'sentiment': 'none'},
{'name': 'zacks investment research', 'sentiment': 'none'}],
'locations': []
}
import itertools
# eq. = itertools.chain.from_iterable(data.values())
dicts = itertools.chain(*data.values())
pairs = [":".join([d['name'], d['sentiment']])
for d in dicts if d['sentiment'] != 'none']
result = ",".join(pairs)
print(result)
# parkinson:5,billy mays:4
# short, but less readable version
result = ",".join([":".join([d['name'], d['sentiment']])
for d in itertools.chain(*data.values())
if d['sentiment'] != 'none'])
答案 1 :(得分:1)
首先,让代码更短更好看的最重要的事情是使用自己的变量。请注意updating
和entity_data = data[category]
。因此,您可以写entity = entity_data[e_id]
而不是entity['name']
。
其次,如果你想要像
这样的东西data[category][e_id]['name']
通过将其更改为
,您可以缩短阅读时间for category in data.keys():
entity_data = data[category]
但是你甚至不需要它,你可以使用for category, entity_data in data.items():
迭代器来获取值。结合这些改进时,您的代码如下所示:
data.values()
(我还将def parse_entities(data):
results = ''
for entity_data in data.values():
for entity in entity_data:
if entity['sentiment'] != 'none':
results += entity['name'] + ":" + entity['sentiment'] + ","
return results
更改为results = results + ...
,将results += ...
更改为if not entity['sentiment'] == 'none'
,因为它更短且不会降低可读性。
当你拥有它时,通过使用列表理解更容易使它变得更短更优雅:
if entity['sentiment'] != 'none'
答案 2 :(得分:1)
也许这样的事情会起作用吗?
# filter to get `nibt` columns and find the first column that equals max
nibt_maxes = df.filter(regex='nibt_\d+').eq(df.max_nibt, 0).idxmax(1)
# swap out the string `nibt` with `line`
lines = nibt_maxes.replace('nibt', 'line', regex=True)
# use `lookup` and assign values
df['model'] = df.lookup(lines.index, lines.values)
cust_id max_nibt nibt_0 nibt_1 nibt_10 line_0 line_1 line_10 model
0 11 200 -5 200 500 100 200 300 200
1 22 300 -10 100 300 100 200 300 300
2 33 400 -20 0 400 100 200 300 300
输出看起来像这样
def parse_entities(data):
results = []
for category in data.keys():
results += list(map(lambda x: '{0}:{1}'.format(x['name'], x['sentiment']),
filter(lambda i: i['sentiment'] != 'none', data[category])))
return ','.join(results)
if __name__ == '__main__':
print(parse_entities(data))
答案 3 :(得分:0)
这是我们需要执行3个单独任务的问题:
以下是代码:
def parse_entities(data):
new_data = [
(row['name'], row['sentiment']) # 3. Transform
for rows in data.values() # 2. Flatten
for row in rows # 2. Flatten
if row['sentiment'] != 'none' # 1. Filter
]
# e.g, new_data = [('parkinson', '5'), ('billy mays', '4')]
return ','.join('{}:{}'.format(*row) for row in new_data)
#
# test code
#
data = {
'locations': [],
'organizations': [
{'name': 'piper jaffray companies', 'sentiment': 'none'},
{'name': 'marketbeat.com', 'sentiment': 'none'},
{'name': 'zacks investment research', 'sentiment': 'none'}
],
'persons': [
{'name': 'parkinson', 'sentiment': '5'},
{'name': 'knott david', 'sentiment': 'none'},
{'name': 'billy mays', 'sentiment': '4'}
],
}
print parse_entities(data)
输出:
parkinson:5,billy mays:4
答案 4 :(得分:0)
这是一个generator expression,它可以做到:
while(reader.hasNextLine()){
String currentLine = reader.readLine();
}
注意:我稍微更改了示例数据,以确保它处理多个data = {'persons': [
{'name': 'parkinson', 'sentiment': '5'},
{'name': 'knott david', 'sentiment': 'none'},
{'name': 'billy mays', 'sentiment': '4'}],
'organizations': [
{'name': 'piper jaffray companies', 'sentiment': 'none'},
{'name': 'marketbeat.com', 'sentiment': '99'},
{'name': 'zacks investment research', 'sentiment': 'none'}],
'locations': []
}
results = ','.join(entity['name'] + ':' + entity['sentiment']
for category, entity_data in data.items()
for entity in entity_data if entity['sentiment'] is not 'none')
print(results) # -> parkinson:5,billy mays:4,marketbeat.com:99
中与您的代码相同的数据。