我正在尝试合并一堆csv文件。每个csv文件都有不同的列数。这不是问题,我可以轻松遍历文件并拉入所有列标题,将它们粘贴到一个空文件中作为基础。
我遇到的问题是列标题位于每个文件的不同行中。
例如:
Table1
Random Text
!,Header1,Header2,Header3
*,123,124,5235
*,124,15,23624
*,135,677,234
Table2
Random Text
Random Text
!,Header1,Header2,Header4
*,124,2156,7478
*,126,12357,547
*,237,12,267
Output:
Table,Header1,Header2,Header3,Header4
Table1,123,124,5235
Table1,124,15,23624
Table1,135,677,234
Table2,124,2156,7478
Table2,126,12357,547
Table2,237,12,267
我现有的代码如下所示:
files = glob.glob(r'//Directory/*.csv')
#This block goes through each file and works out which variables exist
variablelist=[]
for f in files:
with open(f,'r') as csvfile:
read_rows = csv.reader(csvfile)
for row in read_rows:
if row[0]!="*": #The last row with no * in column 1 is the header row
rowlist = row
variablelist.extend(x for x in rowlist if x not in variablelist)
list.sort(variablelist)
我使用的标题行是第一列中没有*的最后一行。我找出标题所在的行,然后将标题名称存储在列表中 - 将所有文件中的相同列表组合在一起。
然后我尝试使用我在搜索此网站时找到的代码将文件组合在一起:
with open("out.csv", "w", newline="") as f_out: # Comment 2 below
writer = csv.DictWriter(f_out, fieldnames=variablelist)
for f in files:
with open(f, "r", newline="",) as f_in:
reader = csv.DictReader(f_in) # Uses the field names in this file
for line in reader:
# Comment 3 below
writer.writerow(line)
问题是,我不知道如何处理不同行上的标题。我尝试使用代码来定义标题行号,但不知道如何在上面的代码中实现它 - (在搜索标题之前,dictreader可以跳过动态行数吗?)
with open(f,'r') as csvfile:
read_rows = csv.reader(csvfile)
header_row_number = 0
for row in read_rows:
if row[0]!="*":
header_row_number=read_rows.line_num
非常感谢任何帮助