我有一个CSV文件,该文件在两个空白行之后开始一个新主题。我想将此文件拆分为两个不同的文件。我该怎么办?
................
................
Biology I
BGS Shivamogga I PUC Exam Results
Student Exam # Questions Correct Answers Score %
ADARSHGOUDA M MUDIGOUDAR Biology I - Chapter 1 35 23 65.70%
ADARSHGOUDA M MUDIGOUDAR Biology I - Chapter 1 35 29 82.90%
ADARSHGOUDA M MUDIGOUDAR Biology I - Chapter 1 35 32 91.40%
.
.
.
.
................
................
Chemistry I
BGS Shivamogga I PUC Exam Results
Student Exam # Questions Correct Answers Score %
AISHWARYA P Chemistry I - Chapter 1 29 20 69.00%
MAHARUDRASWAMY M S Chemistry I - Chapter 1 29 14 48.30%
NIKHIL B Chemistry I - Chapter 1 29 20 69.00%
我尝试使用dropnas
和skiprows
拆分数据帧,但是我不想对行数进行硬编码。我想根据前两个空白行进行拆分。
答案 0 :(得分:0)
我会按照以下方式做些事情:
with open('input.txt','r') as input_file:
data_str = input_file.read()
data_array = data_str.split('\n\n') # Split on all instances of double new lines
for i, smaller_data in enumerate(data_array):
with open(f'new_file_{i}.txt','w') as new_data_file:
new_data_file.write(smaller_data)
答案 1 :(得分:0)
我只使用csv
模块,处理从csv.reader()
到csv.writer()
对象的行,并保持连续的空白行数。每次找到多个空白行时,将写对象替换为一个新文件。
您可以使用any()
function检测到空行,因为空白行将仅包含空字符串或完全没有值:
isblank = not any(row)
假定在同一目录中已编号的文件就足够了,这应该可以工作:
import csv
from pathlib import Path
def gen_outputfiles(outputdir, basefilename):
"""Generate open files ready for CSV writing, in outputdir using basefilename
Numbers are inserted between the basefilename stem and suffix; e.g.
foobar.csv becomes foobar001.csv, foobar002.csv, etc.
"""
outputbase = Path(basefilename)
outputstem, outputsuffix = outputbase.stem, outpubase.suffix
counter = 0
while True:
counter += 1
yield outputdir / f'{outputstem}{counter:03d}{outputsuffix}'.open(mode='w', newline='')
def split_csv_on_doubleblanks(inputfilename, basefilename=None, **kwargs):
"""Copy CSV rows from inputfilename to numbered files based on basefilename
A new numbered target file is created after 2 or more blank rows have been
read from the input CSV file.
"""
inputpath = Path(inputfilename)
outputfiles = gen_outputfiles(inputpath.parent, basefilename or inputpath.name)
with inputpath.open(newline='') as inputfile:
reader = csv.reader(inputfile, **kwargs)
outputfile = next(outputfiles())
writer = csv.writer(outputfile, **kwargs)
blanks = 0
try:
for row in reader:
isblank = not any(row)
if not isblank and blank > 1:
# skipped more than one blank row before finding a non-blank
# row. Open a new output file
outputfile.close()
outputfile = next(outputfile)
writer = csv.writer(outputfile, **kwargs)
blank = blank + 1 if isblank else 0
writer.writerow(row)
finally:
if not outputfile.closed:
outputfile.close()
请注意,我也跨空白行进行复制,因此您的文件确实以多个空白行结尾。可以通过以下方法来解决这一问题:将blanks
计数器替换为空白行列表,以便在您每次要重置计数器且该列表中只有一个元素时将其写入writer对象。这样一来,将保留单个空白行。