Question

我对python很陌生，它试图运行脚本来编辑csv文件。我面临的问题是，我需要将csv文件拆分成较小的部分（因为它们是大文件并遇到内存错误），然后运行另一个脚本来编辑文件，但是当我尝试附加这两个脚本并运行测试时，该脚本仅读取第一个小文件，而不读取其余文件。例如：当我分割主csv文件时，文件将被分割，名称分别为big-1.csv，big-2.csv。然后，当脚本拾取要编辑的文件时，仅big-1.csv被编辑，其余部分未被编辑。脚本是：

import csv
from csv import DictWriter

divisor = 990
outfileno = 1
outfile = None

with open('MOCK_DATA.csv', 'r', newline='') as infile:
    infile_iter = csv.reader(infile, delimiter='\t')
    header = next(infile_iter)

    for index, row in enumerate(infile_iter):
        if index % divisor == 0:
            if outfile:
                outfile.close()

            outfilename = 'big-{}.csv'.format(outfileno)
            outfile = open(outfilename, 'w', newline='')
            outfileno += 1
            writer = csv.writer(outfile, delimiter='\t', quoting=csv.QUOTE_NONE)
            writer.writerow(header)

        writer.writerow(row)

    # Don't forget to close the last file
    if outfile:
        outfile.close()

#export the data
# with correct quoting, and that you are stuck with what you have.


for i in range(1,2):    

        with open("big-" + str(i) + ".csv") as people_file:
            next(people_file)
            corrected_people = []
            for person_line in people_file:
                chomped_person_line = person_line.rstrip()
                person_tokens = chomped_person_line.split(",")

                # check that each field has the expected type
                try:
                    corrected_person = {
                    "id": person_tokens[0],
                    "first_name":person_tokens[1],
                    "last_name": "".join(person_tokens[2:-3]),
                    "email":person_tokens[-3],
                    "gender":person_tokens[-2],
                    "ip_address":person_tokens[-1]  

                    }

                    if not corrected_person["ip_address"].startswith(
                            "") and corrected_person["ip_address"] !="n/a":
                        raise ValueError

                    corrected_people.append(corrected_person)
                except (IndexError, ValueError):
                    # print the ignored lines, so manual correction can be performed later.
                    print("Could not parse line: " + chomped_person_line)

            with open("fix-" + str(i) + ".csv", "w") as corrected_people_file:
                writer = DictWriter(
                    corrected_people_file,
                    fieldnames=[
                        "id","first_name","last_name","email","gender","ip_address"
                ],delimiter=',')
                writer.writeheader()
                writer.writerows(corrected_people)

我认为这可能与在for循环中读取较小的文件有关。该脚本正在运行，没有任何错误。请帮忙。

尝试在Python中使用for循环读取名为file1，file2，file3的文件

0 个答案: