我正在尝试打开从 hdfs 中提取的文本文件,提取某些值,然后将此文件输出到单行 csv 文件中。下面是文本文件的“内容”以及我用来提取数据和输出的代码:
#file.txt
{"timestamp": someInt, "videoId": someString, "overridden": someInt, "scores": [{"bucket": someString, "name": someString, "value": someInt}, {"bucket": someString, "name": someString, "value": someInt}, {"bucket": someString, "name": someString, "value": someInt}, {"bucket": someString, "name": someString, "value": someInt}, {"bucket": someString, "name": someString, "value": someInt}, {"bucket": someString, "name": someString, "value": someInt}]}
{"timestamp": someInt, "videoId": someString, "overridden": someInt, "scores": [{"bucket": someString, "name": someString, "value": someInt}, {"bucket": someString, "name": someString, "value": someInt}, {"bucket": someString, "name": someString, "value": someInt}, {"bucket": someString, "name": someString, "value": someInt}, {"bucket": someString, "name": someString, "value": someInt}, {"bucket": someString, "name": someString, "value": someInt}]}
...
初始代码:
wanted_data = []
with open('file.txt', 'r') as f:
for line in f:
json_data = json.loads(line)
wanted_data.append(json_data['videoId'])
for i in range(6):
wanted_data.append(json_data['scores'][i]['bucket'])
wanted_data.append(json_data['scores'][i]['value'])
with open('file.csv', 'w+') as f_out:
write = csv.writer(f_out)
write.writerow(wanted_data)
导致 JSONDecode 错误:
/usr/lib/python3.7/json/decoder.py in raw_decode(self, s, idx)
353 obj, end = self.scan_once(s, idx)
354 except StopIteration as err:
--> 355 raise JSONDecodeError("Expecting value", s, err.value) from None
356 return obj, end
JSONDecodeError: Expecting value: line 2 column 1 (char 1)
我应该加载这个文本文件的正确方法是什么?
答案 0 :(得分:1)
看起来您在 JSON 字符串之间有空行。在处理之前检查该行实际上有一些文本:
wanted_data = []
with open('file.txt', 'r') as f:
for line in f:
if line.strip():
json_data = json.loads(line)
wanted_data.append(json_data['videoId'])
for score in json_data['scores']:
wanted_data.append(score['bucket'])
wanted_data.append(score['value'])