Question

我正在尝试从巨大的JSON文件（1.9GB）解析json文件，所以我将它们分成10MB（190个文件）的块。

为了简化过程，所以我一次加载80个文件，然后将它们放入列表中

我用它来遍历80个文件

for root, dirs, filenames in os.walk(path):
    for f in filenames:
        function below

这是读取具有更正路径的文件名的功能

dat = 'C:/Users/User/My Lab/Python/scripts/thesis/data_extractor/review/{file}'.format(file=f)
with open(dat) as data_file:
        for item in data_file:
                if len(item) > 1:
                        dict_review.append(item)

在完成该过程之后，我迭代列表并使用json.loads

解析它们

data = None
for row in dict_review:
        data = json.loads(row,'utf-8')

以及发生异常的地方

Unexpected error:  <type 'exceptions.TypeError'>
Reason:  expected string or buffer

我尝试使用str（行）将行转换为字符串，但仍返回相同的异常。

我想知道我做错了什么，谢谢！

解决：

这是我的错误，实际上JSON已正确解析，问题是当我尝试用正则表达式删除所有有趣的字符时

re.sub(r'[^\w]', ' ',data['votes']) 
to
re.sub(r'[^\w]', ' ',str(data['votes']))

我需要将对象转换为字符串

谢谢！

Answer 1

这是我的错误，实际上JSON已正确解析，问题是当我尝试用正则表达式删除所有有趣的字符时

re.sub(r'[^\w]', ' ',data['votes']) 
to
re.sub(r'[^\w]', ' ',str(data['votes']))

我需要将对象转换为字符串

json加载异常期望的字符串或缓冲区python

1 个答案: