在CSV文件上编写多个嵌套词典

时间:2018-09-02 16:28:21

标签: python json csv dictionary twitter

我已经从Twitter JSON数据转换了许多命令。现在,我想将它们转换为一个.csv文件。我搜索了该站点,但解决方案似乎适合仅包含很少值或已存在的字典的字典。就我而言,键的数量要高一点,而且我还必须经历一个迭代过程才能将每个JSON文件转换为字典。换句话说,我想在迭代过程中尽快将每个JSON文件写入.csv文件中。

到目前为止,这是我的代码:

json_path = "C://Users//msalj//OneDrive//Desktop//pypr//Tweets"
for filename in os.listdir(json_path): 
    with open(filename, 'r') as infh:
        for data in json_parse(infh):

这是我转换后的JSON文件的示例:

{'actor': {'displayName': 'RIMarkable',
           'favoritesCount': 0,
           'followersCount': 0,
           'friendsCount': 0,
           'id': 'id:twitter.com:3847371',
           'image': 'Picture_13.png',
           'languages': ['en'],
           'link': 'ht........ble',
           'links': [{'href': 'htt.....m', 'rel': 'me'}],
           'listedCount': 0,
           'objectType': 'person',
           'postedTime': '2007-01-09T02:53:35.000Z',
           'preferredUsername': 'RIMarkable',
           'statusesCount': 0,
           'summary': 'The Official, Unofficial BlackBerry Weblog',
           'twitterTimeZone': 'Eastern Time (US & Canada)',
           'utcOffset': '0',
           'verified': False},
 'body': 'Jim Balsillie To Present At JP Morgan Technology Conference: Research in Motion co-CEO, Jim Balsillie,.. ht...qo',
 'generator': {'displayName': 'twitterfeed', 'link': 'htt......om'},
 'gnip': {'matching_rules': [{'tag': None, 'value': '"JP Morgan"'}]},
 'id': 'tag:search.twitter.com,2005:66178882',
 'link': 'ht...82',
 'object': {'id': 'object:search.twitter.com,2005:66178882',
            'link': 'ht.....82',
            'objectType': 'note',
            'postedTime': '2007-05-16T19:00:24.000Z',
            'summary': 'Jim Balsillie To Present At JP Morgan Technology Conference: Research in Motion co-CEO, Jim Balsillie,.. ht......qo'},
 'objectType': 'activity',
 'postedTime': '2007-05-16T19:00:24.000Z',
 'provider': {'displayName': 'Twitter',
              'link': 'ht......m',
              'objectType': 'service'},
 'retweetCount': 0,
 'twitter_entities': {'hashtags': [],
                      'urls': [{'expanded_url': None,
                                'indices': [105, 130],
                                'url': 'htt.......5qo'}],
                      'user_mentions': []},
 'verb': 'post'}

有人可以帮我编码吗?非常感谢!

1 个答案:

答案 0 :(得分:0)

深度不一,如果您想保留所有东西,这个问题会变得更加复杂。

此问题的解决方法是将字典弄平。

def flatten_dict(input_dict):
    flat_dict = {}
    for k,v in input_dict.items():
        if isinstance(v, dict):
            for k2, v2 in flatten_dict.items():
                flat_dict[k2] = v2
        elif any([isinstance(v, c_type) for c_type in [list, tuple]]):
            for index, i in enumerate(v):
                 flat_dict["{}-{}".format(k, index)] = i
        elif any([isinstance(v, c_type) for c_type in [str, int, float]]):
            flat_dict[k] = v
        else:
            print("unknwon type, add handling for: {}".format(type(v)))
    return flat_dict

然后,我将使用第一个json实例创建标题行:

header_row = [k for k in flatten_dict(row1)]

并将标题行打印到csv

",".join(header_row)

并随后以相同顺序为每个json行打印数据:

for row in rows:
    flat_row = flatten_dict(row)
    print_row = ",".join([flat_row[header] if header in flat_row else "" for header in header_row])