Python加载多个带有中文字符的Json文件

时间:2017-02-23 13:02:24

标签: python json datatable

我的下面有多个Json文件(test.json):

{"_id":{"$oid":"5886dff9129a960d825fd574"},"game_type":6,"desk_id":41387,"round_count":2,"begin_time":{"$date":"2017-01-24T04:58:50.475Z"},"end_time":{"$date":"2017-01-24T05:02:33.959Z"},"club_id":11006,"club_name":"梧州麻将新手圈","owner_nick_name":"牌乐门","create_time":{"$date":"2017-01-24T05:02:49.860Z"},"items":[{"uid":16252,"nickname":"林家斌","win_gold":-4},{"uid":100074706,"nickname":" 年青*战场","win_gold":-4},{"uid":100175661,"nickname":" 所谓","win_gold":12},{"uid":100038017,"nickname":" 暖心","win_gold":-4}],"reason":"玩家退出房间,游戏结算","ok":true}
{"_id":{"$oid":"5886e996129a960d825fdf05"},"game_type":6,"desk_id":38913,"round_count":1,"begin_time":{"$date":"2017-01-24T05:41:26.135Z"},"end_time":{"$date":"2017-01-24T05:43:04.019Z"},"club_id":11006,"club_name":"梧州麻将新手圈","owner_nick_name":"牌乐门","create_time":{"$date":"2017-01-24T05:43:50.020Z"},"items":[{"uid":12028,"nickname":"林2--","win_gold":-2},{"uid":100080735,"nickname":" 圣裔","win_gold":6},{"uid":100087488,"nickname":" 平静","win_gold":-2},{"uid":100017168,"nickname":" 陈颖","win_gold":-2}],"reason":"玩家退出房间,游戏结算","ok":true}
{"_id":{"$oid":"5886ea68129a960d825fe04a"},"game_type":6,"desk_id":40381,"round_count":1,"begin_time":{"$date":"2017-01-24T05:45:40.833Z"},"end_time":{"$date":"2017-01-24T05:47:01.694Z"},"club_id":11006,"club_name":"梧州麻将新手圈","owner_nick_name":"牌乐门","create_time":{"$date":"2017-01-24T05:47:20.723Z"},"items":[{"uid":11987,"nickname":"转转","win_gold":-2},{"uid":100185361,"nickname":" 妞妞儿","win_gold":6},{"uid":100070056,"nickname":" 草木虫","win_gold":-2},{"uid":100195039,"nickname":" 三姑娘","win_gold":-2}],"reason":"玩家退出房间,游戏结算","ok":true}

我在下面尝试过:

pd.concat([json_normalize(json.loads(line)) for line in open('test.json')])

但得到以下错误:

  

----------------------------------------------- ---------------------------- UnicodeDecodeError Traceback(最近一次调用   最后)in()   ----> 1 pd.concat([json_normalize(json.loads(line))for open in line('test.json')])

     

C:\ winpython-64-2.7.10.2 \蟒-2.7.10.amd64 \ lib中\ json__init __ PYC   在load(s,encoding,cls,object_hook,parse_float,parse_int,   parse_constant,object_pairs_hook,** kw)       336 parse_int为None,parse_float为None和       337 parse_constant是None,object_pairs_hook是None而不是kw):    - > 338返回_default_decoder.decode(s)       339如果cls为None:       340 cls = JSONDecoder

     

C:\ winpython-64-2.7.10.2 \蟒-2.7.10.amd64 \ lib中\ JSON \ decoder.pyc   在解码中(self,s,_w)       364       365“”“    - > 366 obj,end = self.raw_decode(s,idx = _w(s,0).end())       367 end = _w(s,end).end()       368 if end!= len(s):

     

C:\ winpython-64-2.7.10.2 \蟒-2.7.10.amd64 \ lib中\ JSON \ decoder.pyc   在raw_decode中(self,s,idx)       380“”“       381尝试:    - > 382 obj,end = self.scan_once(s,idx)       383除StopIteration外:       384引发ValueError(“无JSON对象可被解码”)

     

UnicodeDecodeError:'utf8'编解码器无法解码位置2中的字节0x9a:   起始字节无效

并尝试了以下内容:

import codecs
temp = []
with codecs.open('test.json', 'r') as f:
    for line in f:
        line = line.replace('\n','')
        temp.append(line)
map(json.loads,temp)

得到了同样的错误。

但对于像这样的单身Json:

json_normalize(json.loads('{"_id":{"$oid":"5886dff9129a960d825fd574"},"game_type":6,"desk_id":41387,"round_count":2,"begin_time":{"$date":"2017-01-24T04:58:50.475Z"},"end_time":{"$date":"2017-01-24T05:02:33.959Z"},"club_id":11006,"club_name":"梧州麻将新手圈","owner_nick_name":"牌乐门","create_time":{"$date":"2017-01-24T05:02:49.860Z"},"items":[{"uid":16252,"nickname":"林家斌","win_gold":-4},{"uid":100074706,"nickname":" 年青*战场","win_gold":-4},{"uid":100175661,"nickname":" 所谓","win_gold":12},{"uid":100038017,"nickname":" 暖心","win_gold":-4}],"reason":"玩家退出房间,游戏结算","ok":true}'))

所以得到了我想要的表格:

enter image description here

我希望将所有表格与上表中的一个大表格连接起来。 什么是正确的方法?

0 个答案:

没有答案