我有一个CSV文件,比如说16000行。我需要将其拆分为两个单独的文件,但也需要在约360行的文件中重叠,因此一个文件中的第1-8360行和另一文件中的8000-16000行。或1-8000和7640-16000。
CSV文件如下:
Value X Y Z
4.5234 -46.29753186 -440.4915915 -6291.285393
4.5261 -30.89639381 -441.8390165 -6291.285393
4.5289 -15.45761327 -442.6481287 -6291.285393
4.5318 0 -442.9179423 -6291.285393
我已在Python 3中使用此代码来分割文件,但无法获得想要的重叠:
with open('myfile.csv', 'r') as f:
csvfile = f.readlines()
linesPerFile = 8000
filename = 1
for i in range(0,len(csvfile),linesPerFile+):
with open(str(filename) + '.csv', 'w+') as f:
if filename > 1: # this is the second or later file, we need to write the
f.write(csvfile[0]) # header again if 2nd.... file
f.writelines(csvfile[i:i+linesPerFile])
filename += 1
并尝试这样修改它:
for i in range(0,len(csvfile),linesPerFile+360):
和
f.writelines(csvfile[360-i:i+linesPerFile])
但我无法使其正常工作。
答案 0 :(得分:1)
import pandas as pd
# df = pd.read_csv('source_file.csv')
df = pd.DataFrame(data=pd.np.random.randn(16000, 5))
df.iloc[:8360].to_csv('file_1.csv')
df.iloc[8000:].to_csv('file_2.csv')
答案 1 :(得分:0)
那呢?
for i in range(0,len(csvfile),linesPerFile+):
init = i
with open(str(filename) + '.csv', 'w+') as f:
if filename > 1: # this is the second or later file, we need to write the
f.write(csvfile[0]) # header again if 2nd.... file
init = i - 360
f.writelines(csvfile[init:i+linesPerFile+1])
filename += 1
这是您要找的东西吗?如果没有,请上传测试文件,以便我们提供更好的答案:-)
答案 2 :(得分:0)
希望您使用Pandas得到了一个更优雅的答案。如果您不想安装模块,可以在下面考虑。
def write_files(input_file, file1, file2, file1_end_line_no, file2_end_line_no):
# Open all 3 file handles
with open(input_file) as csv_in, open(file1, 'w') as ff, open(file2, 'w') as sf:
# Process headers
header = next(csv_in)
header = ','.join(header.split())
ff.write(header + '\n')
sf.write(header + '\n')
for index, line in enumerate(csv_in):
line_content = ','.join(line.split()) # 4.5234 -46.29753186 -440.4915915 -6291.285393 => 4.5234,-46.29753186,-440.4915915,-6291.285393
if index <= file1_end_line_no: # Check if index is less than or equals first file's max index
ff.write(line_content + '\n')
if index >= file2_end_line_no: # Check if index is greater than or equals second file's max index
sf.write(line_content + '\n')
样品运行:
if __name__ == '__main__':
in_file = 'csvfile.csv'
write_files(
in_file,
'1.txt',
'2.txt',
2,
2
)