在保持时间戳的同时将XLSX转换为CSV

时间:2014-08-05 22:12:20

标签: python excel csv time xlsx

我正在尝试将充满XLSX文件的目录转换为CSV。一切正常,除了我遇到包含时间信息的列的问题。 XLSX文件由另一个我无法修改的程序创建。但是我希望保持在Excel中查看XLSX文件时显示的相同时间,就像它转换为CSV并在任何文本编辑器中查看一样。

我的代码:

import csv
import xlrd
import os
import fnmatch
import Tkinter, tkFileDialog, tkMessageBox

def main():
    root = Tkinter.Tk()
    root.withdraw()
    print 'Starting .xslx to .csv conversion'
    directory = tkFileDialog.askdirectory()
    for fileName in os.listdir(directory):
        if fnmatch.fnmatch(fileName, '*.xlsx'):
            filePath = os.path.join(directory, fileName)
            saveFile = os.path.splitext(filePath)[0]+".csv"
            savePath = os.path.join(directory, saveFile)
            workbook = xlrd.open_workbook(filePath)
            sheet = workbook.sheet_by_index(0)
            csvOutput = open(savePath, 'wb')
            csvWriter = csv.writer(csvOutput, quoting=csv.QUOTE_ALL)
            for row in xrange(sheet.nrows):
                csvWriter.writerow(sheet.row_values(row))
            csvOutput.close()
    print '.csv conversion complete'

main()

要添加一些细节,如果我在Excel中打开一个文件,我会在时间列中看到:

00:10.3
00:14.2
00:16.1
00:20.0
00:22.0

但在我转换为CSV后,我在同一位置看到了这一点:

0.000118981
0.000164005
0.000186227
0.000231597
0.000254861

感谢seanmhanson的回答https://stackoverflow.com/a/25149562/1858351我能够发现Excel将时间倾倒为一天的小数。虽然我应该尝试更好地学习和使用xlrd,但是为了快速的短期修复,我能够将其转换为秒,然后从几秒钟转换回最初看到的HH:MM:SS的时间格式。如果有人可以使用它,我的(可能是丑陋的)代码如下:

import csv
import xlrd
import os
import fnmatch
from decimal import Decimal
import Tkinter, tkFileDialog

def is_number(s):
    try:
        float(s)
        return True
    except ValueError:
        return False

def seconds_to_hms(seconds):
    input = Decimal(seconds)
    m, s = divmod(input, 60)
    h, m = divmod(m, 60)
    hm = "%02d:%02d:%02.2f" % (h, m, s)
    return hm

def main():
    root = Tkinter.Tk()
    root.withdraw()
    print 'Starting .xslx to .csv conversion'
    directory = tkFileDialog.askdirectory()
    for fileName in os.listdir(directory):
        if fnmatch.fnmatch(fileName, '*.xlsx'):
            filePath = os.path.join(directory, fileName)
            saveFile = os.path.splitext(filePath)[0]+".csv"
            savePath = os.path.join(directory, saveFile)
            workbook = xlrd.open_workbook(filePath)
            sheet = workbook.sheet_by_index(0)
            csvOutput = open(savePath, 'wb')
            csvWriter = csv.writer(csvOutput, quoting=csv.QUOTE_ALL)
            rowData = []
            for rownum in range(sheet.nrows):
                rows = sheet.row_values(rownum)
                for cell in rows:
                    if is_number(cell):
                        seconds = float(cell)*float(86400)
                        hms = seconds_to_hms(seconds)
                        rowData.append((hms))
                    else:
                        rowData.append((cell))
                csvWriter.writerow(rowData)
                rowData = []
            csvOutput.close()
    print '.csv conversion complete'

main()

1 个答案:

答案 0 :(得分:3)

Excel以天为单位将时间存储为浮点数。您需要使用XLRD来确定单元格是否为日期,然后根据需要进行转换。我对XLRD不太满意,但你可能想要类似于此的东西,如果你想保持领先零,改变字符串格式:

if cell.ctype == xlrd.XL_CELL_DATE:
    try: 
        cell_tuple = xldate_as_tuple(cell, 0)
        return "{hours}:{minutes}:{seconds}".format(
            hours=cell_tuple[3], minutes=cell_tuple[4], seconds=cell_tuple[5])
    except (any exceptions thrown by xldate_as_tuple):
        //exception handling

可以在此处找到XLRD日期到元组方法的文档:https://secure.simplistix.co.uk/svn/xlrd/trunk/xlrd/doc/xlrd.html?p=4966#xldate.xldate_as_tuple-function

对于已经回答的类似问题,请参阅此问题:Python: xlrd discerning dates from floats