假设我在同一目录中有许多不同的文本文件,其内容结构如下所示:
档案a.txt:
HEADER_X;HEADER_Y;HEADER_Z
a_value;a_value;a_value
a_value;a_value;a_value
文件b.txt:
HEADER_X;HEADER_Y;HEADER_Z
b_value;b_value;b_value
b_value;b_value;b_value
文件c.txt:
HEADER_X;HEADER_Y;HEADER_Z
c_value;c_value;c_value
c_value;c_value;c_value
文件d.txt:...
我想将所有txt文件合并为一个,方法是将每个文件的内容附加到每个前一个文件的最后一行。见下文:
文件combined.txt:
HEADER_X;HEADER_Y;HEADER_Z
a_value;a_value;a_value
a_value;a_value;a_value
b_value;b_value;b_value
b_value;b_value;b_value
c_value;c_value;c_value
c_value;c_value;c_value
...
我怎样才能在Python中执行此操作?
假设: - 所有txt文件都位于同一文件夹中 - 所有txt文件都有相同的标题 - 所有txt文件具有相同的列数 - 所有txt文件都有不同的行数
答案 0 :(得分:0)
使用CSV Module。像这样:
import csv
with ('output.csv', 'ab') as output:
writer = csv.writer(output, delimiter=";")
with open('a.txt', 'rb') as csvfile:
reader = csv.reader(csvfile, delimiter=";")
reader.readline() // this is to skip the header
for row in spamreader:
writer.writerow(row)
如果您不想在每个文件中进行编码(假设您有多于三个),您可以这样做:
from os import listdir
from os.path import isfile, join
onlyfiles = [ f for f in listdir(mypath) if isfile(join(mypath,f)) ]
for aFile in onlyfiles:
with open(aFile, 'rb') as csvfile:
reader = csv.reader(csvfile, delimiter=";")
reader.readline() // this is to skip the header
for row in spamreader:
writer.writerow(row)
答案 1 :(得分:0)
我设法做了一些似乎有用的事情(至少在我测试过的情况下)。 这将解析所有文件,获取所有标题并格式化每个文件的每一行上的值以添加";"根据该文件中存在/不存在的标题。
headers = []
values = []
files = ("csv0.txt", "csv1.txt")#put the files you want to parse here
#read the files a first time, just to get the headers
for file_name in files:
file = open(file_name, 'r')
first_line = True
for line in file:
if first_line:
first_line = False
for header in line.strip().split(";"):
if header not in headers:
headers.append(header)
else:
break
file.close()
headers = sorted(headers)
#read a second time to get the values
file_number = 0
for file_name in files:
file = open(file_name, 'r')
file_headers = []
first_line = True
corresponding_indexes = []
values.append([])
for line in file:
if first_line:
first_line = False
index = 0
for header in line.strip().split(";"):
while headers[index] != header:
index += 1
corresponding_indexes.append(index)
else:
line_values = line.strip().split(";")
current_index = 0
values_str = ""
for value in line_values:
#this part write the values with ";" added for the headers not in this file
while current_index not in corresponding_indexes:
current_index += 1
values_str += ";"
values_str += value + ";"
current_index += 1
values_str = values_str[:-1] #we remove the last ";" (useless)
values[file_number].append(values_str)
file_number += 1
file.close()
#and now we write the output file with all headers and values
headers_str = ""
for header in headers:
headers_str += header + ";"
headers_str = headers_str[:-1]
output_file = open("output.txt", 'w')
output_file.write(headers_str + "\n")
for file_values in values:
for values_line in file_values:
output_file.write(values_line + "\n")
output_file.close()
如果您有任何疑问,请随时提出。