我对python很陌生,它试图运行脚本来编辑csv文件。我面临的问题是,我需要将csv文件拆分成较小的部分(因为它们是大文件并遇到内存错误),然后运行另一个脚本来编辑文件,但是当我尝试附加这两个脚本并运行测试时,该脚本仅读取第一个小文件,而不读取其余文件。 例如:当我分割主csv文件时,文件将被分割,名称分别为big-1.csv,big-2.csv。然后,当脚本拾取要编辑的文件时,仅big-1.csv被编辑,其余部分未被编辑。 脚本是:
import csv
from csv import DictWriter
divisor = 990
outfileno = 1
outfile = None
with open('MOCK_DATA.csv', 'r', newline='') as infile:
infile_iter = csv.reader(infile, delimiter='\t')
header = next(infile_iter)
for index, row in enumerate(infile_iter):
if index % divisor == 0:
if outfile:
outfile.close()
outfilename = 'big-{}.csv'.format(outfileno)
outfile = open(outfilename, 'w', newline='')
outfileno += 1
writer = csv.writer(outfile, delimiter='\t', quoting=csv.QUOTE_NONE)
writer.writerow(header)
writer.writerow(row)
# Don't forget to close the last file
if outfile:
outfile.close()
#export the data
# with correct quoting, and that you are stuck with what you have.
for i in range(1,2):
with open("big-" + str(i) + ".csv") as people_file:
next(people_file)
corrected_people = []
for person_line in people_file:
chomped_person_line = person_line.rstrip()
person_tokens = chomped_person_line.split(",")
# check that each field has the expected type
try:
corrected_person = {
"id": person_tokens[0],
"first_name":person_tokens[1],
"last_name": "".join(person_tokens[2:-3]),
"email":person_tokens[-3],
"gender":person_tokens[-2],
"ip_address":person_tokens[-1]
}
if not corrected_person["ip_address"].startswith(
"") and corrected_person["ip_address"] !="n/a":
raise ValueError
corrected_people.append(corrected_person)
except (IndexError, ValueError):
# print the ignored lines, so manual correction can be performed later.
print("Could not parse line: " + chomped_person_line)
with open("fix-" + str(i) + ".csv", "w") as corrected_people_file:
writer = DictWriter(
corrected_people_file,
fieldnames=[
"id","first_name","last_name","email","gender","ip_address"
],delimiter=',')
writer.writeheader()
writer.writerows(corrected_people)
我认为这可能与在for循环中读取较小的文件有关。该脚本正在运行,没有任何错误。请帮忙。