尝试在Python中使用for循环读取名为file1,file2,file3的文件

时间:2018-08-07 15:29:29

标签: python csv

我对python很陌生,它试图运行脚本来编辑csv文件。我面临的问题是,我需要将csv文件拆分成较小的部分(因为它们是大文件并遇到内存错误),然后运行另一个脚本来编辑文件,但是当我尝试附加这两个脚本并运行测试时,该脚本仅读取第一个小文件,而不读取其余文件。 例如:当我分割主csv文件时,文件将被分割,名称分别为big-1.csv,big-2.csv。然后,当脚本拾取要编辑的文件时,仅big-1.csv被编辑,其余部分未被编辑。 脚本是:

import csv
from csv import DictWriter

divisor = 990
outfileno = 1
outfile = None

with open('MOCK_DATA.csv', 'r', newline='') as infile:
    infile_iter = csv.reader(infile, delimiter='\t')
    header = next(infile_iter)

    for index, row in enumerate(infile_iter):
        if index % divisor == 0:
            if outfile:
                outfile.close()

            outfilename = 'big-{}.csv'.format(outfileno)
            outfile = open(outfilename, 'w', newline='')
            outfileno += 1
            writer = csv.writer(outfile, delimiter='\t', quoting=csv.QUOTE_NONE)
            writer.writerow(header)

        writer.writerow(row)

    # Don't forget to close the last file
    if outfile:
        outfile.close()

#export the data
# with correct quoting, and that you are stuck with what you have.


for i in range(1,2):    

        with open("big-" + str(i) + ".csv") as people_file:
            next(people_file)
            corrected_people = []
            for person_line in people_file:
                chomped_person_line = person_line.rstrip()
                person_tokens = chomped_person_line.split(",")

                # check that each field has the expected type
                try:
                    corrected_person = {
                    "id": person_tokens[0],
                    "first_name":person_tokens[1],
                    "last_name": "".join(person_tokens[2:-3]),
                    "email":person_tokens[-3],
                    "gender":person_tokens[-2],
                    "ip_address":person_tokens[-1]  

                    }

                    if not corrected_person["ip_address"].startswith(
                            "") and corrected_person["ip_address"] !="n/a":
                        raise ValueError

                    corrected_people.append(corrected_person)
                except (IndexError, ValueError):
                    # print the ignored lines, so manual correction can be performed later.
                    print("Could not parse line: " + chomped_person_line)

            with open("fix-" + str(i) + ".csv", "w") as corrected_people_file:
                writer = DictWriter(
                    corrected_people_file,
                    fieldnames=[
                        "id","first_name","last_name","email","gender","ip_address"
                ],delimiter=',')
                writer.writeheader()
                writer.writerows(corrected_people)

我认为这可能与在for循环中读取较小的文件有关。该脚本正在运行,没有任何错误。请帮忙。

0 个答案:

没有答案