Question

我是python的新手，它试图从包含数百万行的文件中解析数据。试图去老派使用excel解析它，但是失败了。如何有效地解析信息并将其导出到excel文件中，以便其他人更容易阅读？

我尝试使用别人提供的这段代码，但到目前为止还没有运气

import re
import pandas as pd

def clean_data(filename):
    with open(filename, "r") as inputfile:
        for row in inputfile:
            if re.match("\[", row) is None:
                yield row

with open(clean_file,  'w') as outputfile:
    for row in clean_data(filename):
        outputfile.write(row)

NameError: name 'clean_file' is not defined

Answer 1

似乎clean_file未定义，这可能是复制/粘贴代码时出现的问题。

您是要写入名为“ clean_file”的文件吗？在这种情况下，您需要将其用引号引起来：with open("clean_file", 'w')

如果您想使用json，我建议您看看json package，它有很多用于加载和解析json的工具。否则，如果json是扁平的，则可以使用内置的熊猫函数read_json

如何使用熊猫解析jsonlines文件

1 个答案: