Python - 从excel文件

时间:2016-01-11 12:29:54

标签: python excel csv datetime xlrd

我有一个excel文件,它有3列作为日期时间或日期或时间字段。我正在通过xlrd软件包阅读它,我想我的时间为 milliseconds ,当我尝试将其转换回日期时,我的结果会出错。

我尝试将文件转换为csv。这也没有帮助,我得到了奇怪的日期时间格式,我无法理解。

以下是我尝试使用xlrd格式的内容。我更喜欢使用扩展名为.xlrs的文件作为输入,否则我每次获得新的输入文件时都必须将excel文件转换为.csv

from xlrd import open_workbook
import os,pickle,datetime

def main(path, filename, absolute_path_organisation_structure):
    absolute_filepath = os.path.join(path,filename)

    wb = open_workbook(absolute_filepath)
    for sheet in wb.sheets():
        number_of_rows = sheet.nrows
        number_of_columns = sheet.ncols

        for row_index in xrange(1, sheet.nrows):
            row=[]
            for col_index in xrange(4,7): #4th and 6th columns are date fields
                row.append(sheet.cell(row_index, col_index).value)

            print(row)  #Relevant list formed with 4th, 5th and 6th columns
            print(datetime.datetime.fromtimestamp(float(row[0])).strftime('%Y-%m-%d %H:%M:%S'))


path = "C:\\Users\\***************\\NEW DATA"
MISfile  = "P2P_2015 - Copy.xlsx"
absolute_path_organisation_structure = "C:\\Users\\******************NEW DATA\\organisation.csv"
main(path, MISfile, absolute_path_organisation_structure)

结果:

[42011.46789351852, u'Registered', 42009.0]
1970-01-01 17:10:11
[42011.46789351852, u'Sent for CTG1 approval', 42010.0]
1970-01-01 17:10:11
[42011.46789351852, u'Sent back', 42010.0]
1970-01-01 17:10:11
[42011.46789351852, u'Registered', 42011.0]
1970-01-01 17:10:11
[42011.46789351852, u'Sent for CTG1 approval', 42011.0]
1970-01-01 17:10:11
[42011.46789351852, u'Sent for CTG2 approval', 42012.0]
1970-01-01 17:10:11
[42011.46789351852, u'CTG2 Approved', 42012.0]
1970-01-01 17:10:11
[42011.46789351852, u'Sent back', 42013.0]
1970-01-01 17:10:11
[42170.61667824074, u'Registered', 42144.0]
1970-01-01 17:12:50
[42170.61667824074, u'Registered', 42144.0]
1970-01-01 17:12:50
[42170.61667824074, u'Sent back', 42165.0]
1970-01-01 17:12:50
[42170.61667824074, u'Sent back', 42165.0]
1970-01-01 17:12:50
[42170.61667824074, u'Registered', 42170.0]
1970-01-01 17:12:50
[42170.61667824074, u'Registered', 42170.0]
1970-01-01 17:12:50

实际输入文件:(从excel复制)

1/7/2015 11:13  Registered  1/5/2015 0:00
1/7/2015 11:13  Sent for CTG1 approval  1/6/2015 0:00
1/7/2015 11:13  Sent back   1/6/2015 0:00
1/7/2015 11:13  Registered  1/7/2015 0:00
1/7/2015 11:13  Sent for CTG1 approval  1/7/2015 0:00
1/7/2015 11:13  Sent for CTG2 approval  1/8/2015 0:00
1/7/2015 11:13  CTG2 Approved   1/8/2015 0:00
1/7/2015 11:13  Sent back   1/9/2015 0:00
6/15/2015 14:48 Registered  5/20/2015 0:00
6/15/2015 14:48 Registered  5/20/2015 0:00
6/15/2015 14:48 Sent back   6/10/2015 0:00
6/15/2015 14:48 Sent back   6/10/2015 0:00
6/15/2015 14:48 Registered  6/15/2015 0:00
6/15/2015 14:48 Registered  6/15/2015 0:00

为什么我无法正确阅读日期?为什么他们不是简单地将其作为字符串出现以便我可以轻松转换它们?

3 个答案:

答案 0 :(得分:2)

问题是您将Excel日期时间值解释为UNIX时间戳,而它们不是同一个东西。要查找的警告标志是结果值都在UNIX纪元附近(1970-01-01)。

您可以使用描述为in this answer的方法将Excel日期时间转换为UNIX。

Windows / Mac Excel 2011

Unix Timestamp = (Excel Timestamp - 25569) * 86400

Mac Excel 2007

Unix Timestamp = (Excel Timestamp - 24107) * 86400

如果您应用此转换,则应获得正确的输出:

timestamp = (float(row[0]) - 25569) * 86400
datetime.datetime.fromtimestamp(timestamp).strftime('%Y-%m-%d %H:%M:%S')

答案 1 :(得分:2)

  

xldate_as_tuple(xldate,datemode)[#]

     

将Excel编号(假定代表日期,日期时间或时间)转换为适合于提供给datetime或mx.DateTime构造函数的元组。

来源:http://www.lexicon.net/sjmachin/xlrd.html#xlrd.xldate_as_tuple-function

用法示例:How to use ``xlrd.xldate_as_tuple()``

答案 2 :(得分:1)

如果要读取的Excel文件是一个表,可以简单明了地使用pandas.read_excel。 使用pandas.to_datetime

转换日期后
from __future__ import absolute_import, division, print_function
import os
import pandas as pd

def main(path, filename, absolute_path_organisation_structure):
    absolute_filepath = os.path.join(path,filename)
    #Relevant list formed with 4th, 5th and 6th columns
    df = pd.read_excel(absolute_filepath, header=None, parse_cols=[4,5,6])
    # Transform column 0 and 2 to datetime
    df[0] = pd.to_datetime(df[0])
    df[2] = pd.to_datetime(df[2])
    print(df)

path = "C:\\Users\\***************\\NEW DATA"
MISfile  = "P2P_2015 - Copy.xlsx"
main(path, MISfile,None)