我有一个包含数百个JSON行的文件。我写了一个小python脚本,可以提取一些数据,但只适用于一行。我现在想知道如果有多个行如何遍历文件中的所有行。 到目前为止,我有:
import json
from pprint import pprint
"""with open('1st_run_fixed.json') as f:"""
with open('fixed.json') as f:
data = json.load(f)
print "--------------------------------------------";
"""get number of characters"""
nchar = data["frames"]["frame"]["lps"]["lp"]["ncharacter"];
print "Got "+nchar+" characters";
for x in range (1,int(nchar)+1):
x = str(x);
print data["frames"]["frame"]["lps"]["lp"]["characters"]["char"+x]["code_ascii"]+" "+data["frames"]["frame"]["lps"]["lp"]["characters"]["char"+x]["confidence"];
print "--------------------------------------------";
适用于以下数据:
{"response":{"container":{"id":"41d6efcb-24d6-490d-8880-762255519b5f","timestamp":"2018-Jul-11 19:51:06.461665"},
"id":"00000002-0000-0000-0000-000000000015"},
"frames":{"frame":{"id":"5583","timestamp":"2016-Nov-30 13:05:27","lps":{"lp":{"licenseplate":"15451BBL","text":"15451BBL","wtext":"15451BBL","confidence":"20","bkcolor":"16777215","color":"16777215","type":"0","ntip":"11","cct_country_short":"","cct_state_short":"","tips":{"tip":{"poly":{"p":{"x":"1094","y":"643"},
"p":{"x":"1099","y":"643"},
"p":{"x":"1099","y":"667"},
"p":{"x":"1094","y":"667"}},
"bkcolor":"16777215","color":"0","code":"49","code_ascii":"1","confidence":"97"},
"tip":{"poly":{"p":{"x":"1103","y":"642"},
"p":{"x":"1113","y":"642"},
"p":{"x":"1112","y":"667"},
"p":{"x":"1102","y":"667"}},
"bkcolor":"16777215","color":"0","code":"53","code_ascii":"5","confidence":"89"},
"tip":{"poly":{"p":{"x":"1112","y":"640"},
"p":{"x":"1122","y":"640"},
"p":{"x":"1122","y":"666"},
"p":{"x":"1112","y":"666"}},
"bkcolor":"16777215","color":"0","code":"52","code_ascii":"4","confidence":"97"},
"tip":{"poly":{"p":{"x":"1123","y":"640"},
"p":{"x":"1132","y":"640"},
"p":{"x":"1131","y":"665"},
"p":{"x":"1123","y":"665"}},
"bkcolor":"16777215","color":"0","code":"53","code_ascii":"5","confidence":"97"},
"tip":{"poly":{"p":{"x":"1134","y":"640"},
"p":{"x":"1139","y":"640"},
"p":{"x":"1139","y":"664"},
"p":{"x":"1133","y":"664"}},
"bkcolor":"16777215","color":"0","code":"49","code_ascii":"1","confidence":"77"},
"tip":{"poly":{"p":{"x":"1154","y":"639"},
"p":{"x":"1163","y":"639"},
"p":{"x":"1163","y":"663"},
"p":{"x":"1153","y":"663"}},
"bkcolor":"16777215","color":"0","code":"66","code_ascii":"B","confidence":"97"},
"tip":{"poly":{"p":{"x":"1164","y":"638"},
"p":{"x":"1173","y":"638"},
"p":{"x":"1173","y":"663"},
"p":{"x":"1163","y":"663"}},
"bkcolor":"16777215","color":"0","code":"66","code_ascii":"B","confidence":"94"},
"tip":{"poly":{"p":{"x":"1191","y":"637"},
"p":{"x":"1206","y":"636"},
"p":{"x":"1205","y":"660"},
"p":{"x":"1190","y":"661"}},
"bkcolor":"16777215","color":"0","code":"76","code_ascii":"L","confidence":"34"},
"tip":{"poly":{"p":{"x":"1103","y":"655"},
"p":{"x":"1111","y":"655"},
"p":{"x":"1111","y":"667"},
"p":{"x":"1103","y":"667"}},
"bkcolor":"16777215","color":"0","code":"74","code_ascii":"J","confidence":"57"},
"tip":{"poly":{"p":{"x":"1103","y":"655"},
"p":{"x":"1111","y":"655"},
"p":{"x":"1111","y":"667"},
"p":{"x":"1103","y":"667"}},
"bkcolor":"16777215","color":"0","code":"74","code_ascii":"J","confidence":"57"},
"tip":{"poly":{"p":{"x":"1176","y":"638"},
"p":{"x":"1185","y":"637"},
"p":{"x":"1184","y":"661"},
"p":{"x":"1175","y":"662"}},
"bkcolor":"16777215","color":"0","code":"52","code_ascii":"4","confidence":"7"}},
"ncharacter":"8","characters":{"char1":{"poly":{"p":{"x":"1094","y":"643"},
"p":{"x":"1099","y":"643"},
"p":{"x":"1099","y":"667"},
"p":{"x":"1094","y":"667"}},
"bkcolor":"16777215","color":"0","code":"49","code_ascii":"1","confidence":"97"},
"char2":{"poly":{"p":{"x":"1103","y":"642"},
"p":{"x":"1113","y":"642"},
"p":{"x":"1112","y":"667"},
"p":{"x":"1102","y":"667"}},
"bkcolor":"16777215","color":"0","code":"53","code_ascii":"5","confidence":"89"},
"char3":{"poly":{"p":{"x":"1112","y":"640"},
"p":{"x":"1122","y":"640"},
"p":{"x":"1122","y":"666"},
"p":{"x":"1112","y":"666"}},
"bkcolor":"16777215","color":"0","code":"52","code_ascii":"4","confidence":"97"},
"char4":{"poly":{"p":{"x":"1123","y":"640"},
"p":{"x":"1132","y":"640"},
"p":{"x":"1131","y":"665"},
"p":{"x":"1123","y":"665"}},
"bkcolor":"16777215","color":"0","code":"53","code_ascii":"5","confidence":"97"},
"char5":{"poly":{"p":{"x":"1134","y":"640"},
"p":{"x":"1139","y":"640"},
"p":{"x":"1139","y":"664"},
"p":{"x":"1133","y":"664"}},
"bkcolor":"16777215","color":"0","code":"49","code_ascii":"1","confidence":"77"},
"char6":{"poly":{"p":{"x":"1154","y":"639"},
"p":{"x":"1163","y":"639"},
"p":{"x":"1163","y":"663"},
"p":{"x":"1153","y":"663"}},
"bkcolor":"16777215","color":"0","code":"66","code_ascii":"B","confidence":"97"},
"char7":{"poly":{"p":{"x":"1164","y":"638"},
"p":{"x":"1173","y":"638"},
"p":{"x":"1173","y":"663"},
"p":{"x":"1163","y":"663"}},
"bkcolor":"16777215","color":"0","code":"66","code_ascii":"B","confidence":"94"},
"char8":{"poly":{"p":{"x":"1191","y":"637"},
"p":{"x":"1206","y":"636"},
"p":{"x":"1205","y":"660"},
"p":{"x":"1190","y":"661"}},
"bkcolor":"16777215","color":"0","code":"76","code_ascii":"L","confidence":"34"}},
"det_time_us":"1072592","poly":{"p":{"x":"1088","y":"642"},
"p":{"x":"1210","y":"634"},
"p":{"x":"1210","y":"661"},
"p":{"x":"1087","y":"669"}}}},
"det_time_us":"1720812"}}}
但我也想使其适用于以下数据:
{"response":{"container":{"id":"80d996a1-c267-4fa4-b3f8-f61ff9fda198","timestamp":"2018-Jul-10 17:00:50.829709"},
"id":"00000002-0000-0000-0000-000000000002"},
"frames":{"frame":{"id":"398","timestamp":"2016-Nov-30 12:56:47.900000","lps":{"lp":{"licenseplate":"FRJ724","text":"FRJ724","wtext":"FRJ724","confidence":"67","bkcolor":"16777215","color":"16777215","type":"540122","ntip":"6","cct_country_short":"USA","cct_state_short":"NY","tips":{"tip":{"poly":{"p":{"x":"1553","y":"249"},
"p":{"x":"1559","y":"249"},
"p":{"x":"1559","y":"267"},
"p":{"x":"1553","y":"267"}},
"bkcolor":"16777215","color":"0","code":"70","code_ascii":"F","confidence":"88"},
"tip":{"poly":{"p":{"x":"1561","y":"248"},
"p":{"x":"1568","y":"248"},
"p":{"x":"1568","y":"267"},
"p":{"x":"1561","y":"267"}},
"bkcolor":"16777215","color":"0","code":"82","code_ascii":"R","confidence":"96"},
"tip":{"poly":{"p":{"x":"1569","y":"248"},
"p":{"x":"1575","y":"248"},
"p":{"x":"1576","y":"267"},
"p":{"x":"1569","y":"267"}},
"bkcolor":"16777215","color":"0","code":"74","code_ascii":"J","confidence":"96"},
"tip":{"poly":{"p":{"x":"1585","y":"248"},
"p":{"x":"1591","y":"248"},
"p":{"x":"1591","y":"267"},
"p":{"x":"1585","y":"267"}},
"bkcolor":"16777215","color":"0","code":"55","code_ascii":"7","confidence":"94"},
"tip":{"poly":{"p":{"x":"1593","y":"248"},
"p":{"x":"1600","y":"248"},
"p":{"x":"1600","y":"267"},
"p":{"x":"1593","y":"267"}},
"bkcolor":"16777215","color":"0","code":"50","code_ascii":"2","confidence":"88"},
"tip":{"poly":{"p":{"x":"1602","y":"248"},
"p":{"x":"1607","y":"248"},
"p":{"x":"1607","y":"266"},
"p":{"x":"1602","y":"266"}},
"bkcolor":"16777215","color":"0","code":"52","code_ascii":"4","confidence":"99"}},
"ncharacter":"6","characters":{"char1":{"poly":{"p":{"x":"1553","y":"249"},
"p":{"x":"1559","y":"249"},
"p":{"x":"1559","y":"267"},
"p":{"x":"1553","y":"267"}},
"bkcolor":"16777215","color":"0","code":"70","code_ascii":"F","confidence":"88"},
"char2":{"poly":{"p":{"x":"1561","y":"248"},
"p":{"x":"1568","y":"248"},
"p":{"x":"1568","y":"267"},
"p":{"x":"1561","y":"267"}},
"bkcolor":"16777215","color":"0","code":"82","code_ascii":"R","confidence":"96"},
"char3":{"poly":{"p":{"x":"1569","y":"248"},
"p":{"x":"1575","y":"248"},
"p":{"x":"1576","y":"267"},
"p":{"x":"1569","y":"267"}},
"bkcolor":"16777215","color":"0","code":"74","code_ascii":"J","confidence":"96"},
"char4":{"poly":{"p":{"x":"1585","y":"248"},
"p":{"x":"1591","y":"248"},
"p":{"x":"1591","y":"267"},
"p":{"x":"1585","y":"267"}},
"bkcolor":"16777215","color":"0","code":"55","code_ascii":"7","confidence":"94"},
"char5":{"poly":{"p":{"x":"1593","y":"248"},
"p":{"x":"1600","y":"248"},
"p":{"x":"1600","y":"267"},
"p":{"x":"1593","y":"267"}},
"bkcolor":"16777215","color":"0","code":"50","code_ascii":"2","confidence":"88"},
"char6":{"poly":{"p":{"x":"1602","y":"248"},
"p":{"x":"1607","y":"248"},
"p":{"x":"1607","y":"266"},
"p":{"x":"1602","y":"266"}},
"bkcolor":"16777215","color":"0","code":"52","code_ascii":"4","confidence":"99"}},
"det_time_us":"776874","poly":{"p":{"x":"1543","y":"237"},
"p":{"x":"1618","y":"237"},
"p":{"x":"1618","y":"274"},
"p":{"x":"1543","y":"274"}}}},
"det_time_us":"1883017"}}}
{"response":{"container":{"id":"fa75e8f8-1b44-4f2f-a09b-6fe3b801ca1b","timestamp":"2018-Jul-10 17:00:55.863641"},
"id":"00000002-0000-0000-0000-000000000002"},
"frames":{"frame":{"id":"399","timestamp":"2016-Nov-30 12:56:48","lps":{"lp":{"licenseplate":"FRJ724","text":"FRJ724","wtext":"FRJ724","confidence":"47","bkcolor":"16777215","color":"16777215","type":"540122","ntip":"6","cct_country_short":"USA","cct_state_short":"NY","tips":{"tip":{"poly":{"p":{"x":"1553","y":"248"},
"p":{"x":"1560","y":"248"},
"p":{"x":"1560","y":"266"},
"p":{"x":"1554","y":"266"}},
"bkcolor":"16777215","color":"0","code":"70","code_ascii":"F","confidence":"96"},
"tip":{"poly":{"p":{"x":"1561","y":"248"},
"p":{"x":"1568","y":"248"},
"p":{"x":"1568","y":"267"},
"p":{"x":"1561","y":"267"}},
"bkcolor":"16777215","color":"0","code":"82","code_ascii":"R","confidence":"98"},
"tip":{"poly":{"p":{"x":"1569","y":"247"},
"p":{"x":"1576","y":"247"},
"p":{"x":"1576","y":"267"},
"p":{"x":"1569","y":"267"}},
"bkcolor":"16777215","color":"0","code":"74","code_ascii":"J","confidence":"96"},
"tip":{"poly":{"p":{"x":"1586","y":"248"},
"p":{"x":"1592","y":"248"},
"p":{"x":"1592","y":"267"},
"p":{"x":"1586","y":"267"}},
"bkcolor":"16777215","color":"0","code":"55","code_ascii":"7","confidence":"95"},
"tip":{"poly":{"p":{"x":"1593","y":"248"},
"p":{"x":"1600","y":"248"},
"p":{"x":"1600","y":"267"},
"p":{"x":"1593","y":"267"}},
"bkcolor":"16777215","color":"0","code":"50","code_ascii":"2","confidence":"86"},
"tip":{"poly":{"p":{"x":"1601","y":"249"},
"p":{"x":"1608","y":"249"},
"p":{"x":"1608","y":"265"},
"p":{"x":"1601","y":"265"}},
"bkcolor":"16777215","color":"0","code":"52","code_ascii":"4","confidence":"63"}},
"ncharacter":"6","characters":{"char7":{"poly":{"p":{"x":"1553","y":"248"},
"p":{"x":"1560","y":"248"},
"p":{"x":"1560","y":"266"},
"p":{"x":"1554","y":"266"}},
"bkcolor":"16777215","color":"0","code":"70","code_ascii":"F","confidence":"96"},
"char8":{"poly":{"p":{"x":"1561","y":"248"},
"p":{"x":"1568","y":"248"},
"p":{"x":"1568","y":"267"},
"p":{"x":"1561","y":"267"}},
"bkcolor":"16777215","color":"0","code":"82","code_ascii":"R","confidence":"98"},
"char9":{"poly":{"p":{"x":"1569","y":"247"},
"p":{"x":"1576","y":"247"},
"p":{"x":"1576","y":"267"},
"p":{"x":"1569","y":"267"}},
"bkcolor":"16777215","color":"0","code":"74","code_ascii":"J","confidence":"96"},
"char10":{"poly":{"p":{"x":"1586","y":"248"},
"p":{"x":"1592","y":"248"},
"p":{"x":"1592","y":"267"},
"p":{"x":"1586","y":"267"}},
"bkcolor":"16777215","color":"0","code":"55","code_ascii":"7","confidence":"95"},
"char11":{"poly":{"p":{"x":"1593","y":"248"},
"p":{"x":"1600","y":"248"},
"p":{"x":"1600","y":"267"},
"p":{"x":"1593","y":"267"}},
"bkcolor":"16777215","color":"0","code":"50","code_ascii":"2","confidence":"86"},
"char12":{"poly":{"p":{"x":"1601","y":"249"},
"p":{"x":"1608","y":"249"},
"p":{"x":"1608","y":"265"},
"p":{"x":"1601","y":"265"}},
"bkcolor":"16777215","color":"0","code":"52","code_ascii":"4","confidence":"63"}},
"det_time_us":"600136","poly":{"p":{"x":"1543","y":"238"},
"p":{"x":"1618","y":"239"},
"p":{"x":"1619","y":"274"},
"p":{"x":"1543","y":"273"}}}},
"det_time_us":"1495308"}}}
{"response":{"container":{"id":"5c9c773c-a72a-488f-bc49-148dcd6cfa0a","timestamp":"2018-Jul-10 17:01:01.756522"},
"id":"00000002-0000-0000-0000-000000000002"},
"frames":{"frame":{"id":"400","timestamp":"2016-Nov-30 12:56:48.100000","lps":{"lp":{"licenseplate":"FRJ724","text":"FRJ724","wtext":"FRJ724","confidence":"47","bkcolor":"16777215","color":"16777215","type":"540122","ntip":"6","cct_country_short":"USA","cct_state_short":"NY","tips":{"tip":{"poly":{"p":{"x":"1553","y":"248"},
"p":{"x":"1560","y":"248"},
"p":{"x":"1560","y":"266"},
"p":{"x":"1554","y":"266"}},
"bkcolor":"16777215","color":"0","code":"70","code_ascii":"F","confidence":"96"},
"tip":{"poly":{"p":{"x":"1561","y":"248"},
"p":{"x":"1568","y":"248"},
"p":{"x":"1568","y":"267"},
"p":{"x":"1561","y":"267"}},
"bkcolor":"16777215","color":"0","code":"82","code_ascii":"R","confidence":"98"},
"tip":{"poly":{"p":{"x":"1569","y":"247"},
"p":{"x":"1576","y":"247"},
"p":{"x":"1576","y":"267"},
"p":{"x":"1569","y":"267"}},
"bkcolor":"16777215","color":"0","code":"74","code_ascii":"J","confidence":"96"},
"tip":{"poly":{"p":{"x":"1586","y":"248"},
"p":{"x":"1592","y":"248"},
"p":{"x":"1592","y":"267"},
"p":{"x":"1586","y":"267"}},
"bkcolor":"16777215","color":"0","code":"55","code_ascii":"7","confidence":"95"},
"tip":{"poly":{"p":{"x":"1593","y":"248"},
"p":{"x":"1600","y":"248"},
"p":{"x":"1600","y":"267"},
"p":{"x":"1593","y":"267"}},
"bkcolor":"16777215","color":"0","code":"50","code_ascii":"2","confidence":"86"},
"tip":{"poly":{"p":{"x":"1601","y":"249"},
"p":{"x":"1608","y":"249"},
"p":{"x":"1608","y":"265"},
"p":{"x":"1601","y":"265"}},
"bkcolor":"16777215","color":"0","code":"52","code_ascii":"4","confidence":"63"}},
"ncharacter":"6","characters":{"char13":{"poly":{"p":{"x":"1553","y":"248"},
"p":{"x":"1560","y":"248"},
"p":{"x":"1560","y":"266"},
"p":{"x":"1554","y":"266"}},
"bkcolor":"16777215","color":"0","code":"70","code_ascii":"F","confidence":"96"},
"char14":{"poly":{"p":{"x":"1561","y":"248"},
"p":{"x":"1568","y":"248"},
"p":{"x":"1568","y":"267"},
"p":{"x":"1561","y":"267"}},
"bkcolor":"16777215","color":"0","code":"82","code_ascii":"R","confidence":"98"},
"char15":{"poly":{"p":{"x":"1569","y":"247"},
"p":{"x":"1576","y":"247"},
"p":{"x":"1576","y":"267"},
"p":{"x":"1569","y":"267"}},
"bkcolor":"16777215","color":"0","code":"74","code_ascii":"J","confidence":"96"},
"char16":{"poly":{"p":{"x":"1586","y":"248"},
"p":{"x":"1592","y":"248"},
"p":{"x":"1592","y":"267"},
"p":{"x":"1586","y":"267"}},
"bkcolor":"16777215","color":"0","code":"55","code_ascii":"7","confidence":"95"},
"char17":{"poly":{"p":{"x":"1593","y":"248"},
"p":{"x":"1600","y":"248"},
"p":{"x":"1600","y":"267"},
"p":{"x":"1593","y":"267"}},
"bkcolor":"16777215","color":"0","code":"50","code_ascii":"2","confidence":"86"},
"char18":{"poly":{"p":{"x":"1601","y":"249"},
"p":{"x":"1608","y":"249"},
"p":{"x":"1608","y":"265"},
"p":{"x":"1601","y":"265"}},
"bkcolor":"16777215","color":"0","code":"52","code_ascii":"4","confidence":"63"}},
"det_time_us":"457492","poly":{"p":{"x":"1543","y":"238"},
"p":{"x":"1618","y":"239"},
"p":{"x":"1619","y":"274"},
"p":{"x":"1543","y":"273"}}}},
"det_time_us":"1311946"}}}
我该怎么做?
我的脚本当前返回:
Traceback (most recent call last):
File "read.py", line 8, in <module>
data = json.load(f)
File "/usr/lib/python2.7/json/__init__.py", line 291, in load
**kw)
File "/usr/lib/python2.7/json/__init__.py", line 339, in loads
return _default_decoder.decode(s)
File "/usr/lib/python2.7/json/decoder.py", line 367, in decode
raise ValueError(errmsg("Extra data", s, end, len(s)))
ValueError: Extra data: line 68 column 1 - line 202 column 1 (char 3182 - 9548)
shell returned 1
当我运行大文件时。
答案 0 :(得分:1)
我有一个包含数百个JSON行的文件。
不,你就是这样。
数百个JSON文本不是有效的JSON文件。有效的JSON文件只是一个文本。这就是json.load
返回错误的原因。
数百个JSON文本,每个文本恰好适合一行,并且它们之间的换行符 是其他格式(例如JSONlines或NDJ)的有效文件。它仍然不是有效的JSON文件,因此您不能使用json.load
,但可以使用JSONlines或NDJ库,或者只这样解析:
with open('fixed.json') as f:
for line in f:
data = json.loads(line)
# do stuff
再次,要编写JSONlines文件,您可以使用JSONlines库,或者可以确保每个JSON文本都没有嵌入的换行符-如果未指定非默认{{ 1}}或ensure_ascii
参数-只需为每个值写出indent
。
但是数百个JSON文本(每个文本占用多行)不是有效的任何文件。
这实际上是explained in the json
module docs:
注意与
json.dumps(data) + "\n"
和pickle
不同,JSON不是框架协议,因此请尝试使用相同的 fp 重复调用marshal
来序列化多个对象将导致无效的JSON文件。
基本上,“不是成帧协议”的意思是格式将是模棱两可的。例如,如果先进行dump()
,然后进行json.dump(2, f)
,则文件中将得到json.dump(3, f)
。您从23
得到的东西也是一样。
如果您可以将文件修复为有效的东西(例如JSONlines),那么这就是简单的解决方案。
如果不能...
嗯,预先标准化了一个“ JSON文档”的概念,这基本上意味着一个JSON文本,可以是数组或对象。而且JSON文档流不是不明确。
由于这不是标准格式,因此您可能不会为其找到解析器,因此您必须自己编写一个。
您可以采用的一种方法是使用json.dump(23, f)
模块中的raw_decode
方法。这将尝试解码JSON文本(可能在其后添加其他内容),并将索引返回到该其他内容。您的情况是下一个JSON文档。
由于数百个大小不大的对象,将整个文件读入内存然后进行解析可能更简单,因此我们不必担心缓冲:
json
请记住,只有在文件是JSON文档流的情况下,这才起作用-也就是说,顶级值始终是Array或Object。此外,如果您正在手动编辑这些文件,则与JSONlines不同,JSONlines可以跳过一个错误的文本并继续解析其余文本,现在这里有一种从错误中恢复的方法,因为您不知道下一个文档的起始位置。< / p>