Question

我正在尝试将xlsx文件转换为包含标题的CSV文件和包含实际数据的另一个CSV文件。我有以下要求：

标题不是从第一行开始，而是从行start_line开始。
日期不应被视为浮点数，而是以某种字符串格式。
我事先并不知道文件的总行数或列数。我也不想指定哪个列是日期。

使用pandas我被困在第1位。我想在两个单独的读取中实现这一点，我从start_line到start_line+1和从start_line+1读到最后。但是从偏移量中读取n行似乎是not possible。下面是我用来获取包含标题的文件的代码。

import pandas as pd
def parse_excel(file,start_line,sheet,table):
    sh = pd.read_excel(file,sheet,skiprows=start_line)
    sh.to_csv("output.csv",sep='\t',encoding='utf-8',index=False)

接下来我使用xlrd尝试过此操作，但此库会将所有日期视为Excel中的浮点数。这里唯一的解决方法似乎go through all individual cells似乎没有效率或编码良好。我现在拥有的：

import xlrd
def parse_excel(file,start_line,sheet,table):
    with xlrd.open_workbook(file) as wb:
        sh = wb.sheet_by_name(sheet)
        header_written = False
        with open('{0}.csv'.format(table),'wb') as csv_file:
            wr = csv.writer(csv_file,delimiter='\t')
            for rownum in range(sh.nrows):
                if not header_written and start_line == rownum:
                    with open('{0}_header.csv'.format(table),'wb') as header:
                        hwr = csv.writer(header,delimiter='\t')
                        hwr.writerow(sh.row_values(rownum))
                        header_written = True
                elif header_written:
                    wr.writerow(sh.row_values(rownum))

请指出其他解决方案/库，显示上述任何一种解决方法或解释为什么我应该选择xlrd解决方法检查每个单元格。

Answer 1

只要您的所有数据都在标题行下方，就可以使用以下内容。假设标题行位于行n（索引从0开始，而不是像excel一样）。

df = pd.read_excel('filepath', header=n)
df.head(0).to_csv('header.csv', index=False)
df.to_csv('output.csv', header=None, index=False)

如何在保留日期值的同时在Python中将n行xlsx转换为csv

1 个答案: