我已经从Twitter JSON数据转换了许多命令。现在,我想将它们转换为一个.csv文件。我搜索了该站点,但解决方案似乎适合仅包含很少值或已存在的字典的字典。就我而言,键的数量要高一点,而且我还必须经历一个迭代过程才能将每个JSON文件转换为字典。换句话说,我想在迭代过程中尽快将每个JSON文件写入.csv文件中。
到目前为止,这是我的代码:
json_path = "C://Users//msalj//OneDrive//Desktop//pypr//Tweets"
for filename in os.listdir(json_path):
with open(filename, 'r') as infh:
for data in json_parse(infh):
这是我转换后的JSON文件的示例:
{'actor': {'displayName': 'RIMarkable',
'favoritesCount': 0,
'followersCount': 0,
'friendsCount': 0,
'id': 'id:twitter.com:3847371',
'image': 'Picture_13.png',
'languages': ['en'],
'link': 'ht........ble',
'links': [{'href': 'htt.....m', 'rel': 'me'}],
'listedCount': 0,
'objectType': 'person',
'postedTime': '2007-01-09T02:53:35.000Z',
'preferredUsername': 'RIMarkable',
'statusesCount': 0,
'summary': 'The Official, Unofficial BlackBerry Weblog',
'twitterTimeZone': 'Eastern Time (US & Canada)',
'utcOffset': '0',
'verified': False},
'body': 'Jim Balsillie To Present At JP Morgan Technology Conference: Research in Motion co-CEO, Jim Balsillie,.. ht...qo',
'generator': {'displayName': 'twitterfeed', 'link': 'htt......om'},
'gnip': {'matching_rules': [{'tag': None, 'value': '"JP Morgan"'}]},
'id': 'tag:search.twitter.com,2005:66178882',
'link': 'ht...82',
'object': {'id': 'object:search.twitter.com,2005:66178882',
'link': 'ht.....82',
'objectType': 'note',
'postedTime': '2007-05-16T19:00:24.000Z',
'summary': 'Jim Balsillie To Present At JP Morgan Technology Conference: Research in Motion co-CEO, Jim Balsillie,.. ht......qo'},
'objectType': 'activity',
'postedTime': '2007-05-16T19:00:24.000Z',
'provider': {'displayName': 'Twitter',
'link': 'ht......m',
'objectType': 'service'},
'retweetCount': 0,
'twitter_entities': {'hashtags': [],
'urls': [{'expanded_url': None,
'indices': [105, 130],
'url': 'htt.......5qo'}],
'user_mentions': []},
'verb': 'post'}
有人可以帮我编码吗?非常感谢!
答案 0 :(得分:0)
深度不一,如果您想保留所有东西,这个问题会变得更加复杂。
此问题的解决方法是将字典弄平。
def flatten_dict(input_dict):
flat_dict = {}
for k,v in input_dict.items():
if isinstance(v, dict):
for k2, v2 in flatten_dict.items():
flat_dict[k2] = v2
elif any([isinstance(v, c_type) for c_type in [list, tuple]]):
for index, i in enumerate(v):
flat_dict["{}-{}".format(k, index)] = i
elif any([isinstance(v, c_type) for c_type in [str, int, float]]):
flat_dict[k] = v
else:
print("unknwon type, add handling for: {}".format(type(v)))
return flat_dict
然后,我将使用第一个json实例创建标题行:
header_row = [k for k in flatten_dict(row1)]
并将标题行打印到csv
",".join(header_row)
并随后以相同顺序为每个json行打印数据:
for row in rows:
flat_row = flatten_dict(row)
print_row = ",".join([flat_row[header] if header in flat_row else "" for header in header_row])