索引时间栏中未提供的熊猫默认日期

时间:2018-11-26 12:33:13

标签: python pandas

我正在使用xlsx加载pandas.read_excel(..., parse_dates=True, index_col=0, ...)个文件,文件的开头是这样的:

| UTC Time | Alt  | ... |
|----------|------|-----|
| 13:18:44 | 1234 | ... |
| 13:18:45 | 1235 | ... |
| 13:18:46 | 1236 | ... |
| 13:18:47 | 1237 | ... |

生成的DataFrame索引是有效的DateTime,但是其date()返回当天,而文件中未提供。

那么,有人会知道一种方法来检测pandas是否默认为当前日期,而无需使用xlrd打开文件或将第一列作为解析为简单字符串的附加数据列添加吗?

感谢您的帮助!


这是我得到的一个测试用例,注意使用parse_dates参数:

>>> import pandas as pd
>>> import xlrd
>>> fn = "test/dfdr_example 2.xlsx"
>>> data_to_retriev = [3, 4]

因此,如果我们打开文件“ raw”,则没有设置日期,只有时间:

>>> wb = xlrd.open_workbook(fn)
>>> for row in wb.sheet_by_index(0).get_rows():
...     print(row)

[text:'UTC Time (hh:mm:ss)', text:'2 GPS Groundspeed (knots)', text:'Auto Speed Active (discrete)', text:'Approach Identifier Right (ASCII)']
[xldate:0.5816087962962962, number:207.0, text:'engaged', empty:'']
[xldate:0.5816203703703704, number:208.0, text:'engaged', empty:'']
[xldate:0.5816319444444444, number:210.0, text:'engaged', number:23.0]
[xldate:0.5816435185185186, number:211.0, text:'engaged', empty:'']
[xldate:0.5816550925925926, number:212.0, text:'engaged', empty:'']
[xldate:0.5816666666666667, number:213.0, text:'engaged', empty:'']
[xldate:0.5816782407407407, number:214.0, text:'engaged', number:23.0]
[xldate:0.5816898148148147, number:215.0, text:'engaged', empty:'']
[xldate:0.5817013888888889, number:216.0, text:'engaged', empty:'']
[xldate:0.5817129629629629, number:217.0, text:'engaged', empty:'']

现在以pandas打开:

>>> df = pd.read_excel(fn, parse_dates=True, index_col=0, use_cols=[0] + data_to_retrieve)
>>> df

    2 GPS Groundspeed (knots)   Auto Speed Active (discrete)    Approach Identifier Right (ASCII)
UTC Time (hh:mm:ss)         
2018-11-26 13:57:31 207 engaged NaN
2018-11-26 13:57:32 208 engaged NaN
2018-11-26 13:57:33 210 engaged 23
2018-11-26 13:57:34 211 engaged NaN
2018-11-26 13:57:35 212 engaged NaN
2018-11-26 13:57:36 213 engaged NaN

>>> df.index[0]
Timestamp('2018-11-126 13:57:31')

发生的情况的其他说明:

>>> df = pd.read_excel(fn, parse_date=True, index_col=0, use_cols=[0] + data_to_retrieve)
>>> df1 = pd.read_excel(fn, parse_dates=True, index_col=0, use_cols=[0] + data_to_retrieve)
>>> df.index, df1.index
(Index([13:57:31, 13:57:32, 13:57:33, 13:57:34, 13:57:35, 13:57:36, 13:57:37,
        13:57:38, 13:57:39, 13:57:40, 13:57:41, 13:57:42, 13:57:43, 13:57:44,
        13:57:45, 13:57:46, 13:57:47, 13:57:48, 13:57:49, 13:57:50, 13:57:51,
        13:57:52, 13:57:53, 13:57:54, 13:57:55, 13:57:56, 13:57:57, 13:57:58],
       dtype='object', name='UTC Time (hh:mm:ss)'),
 DatetimeIndex(['2018-11-26 13:57:31', '2018-11-26 13:57:32',
                '2018-11-26 13:57:33', '2018-11-26 13:57:34',
                '2018-11-26 13:57:35', '2018-11-26 13:57:36',
                '2018-11-26 13:57:37', '2018-11-26 13:57:38',
                '2018-11-26 13:57:39', '2018-11-26 13:57:40',
                '2018-11-26 13:57:41', '2018-11-26 13:57:42',
                '2018-11-26 13:57:43', '2018-11-26 13:57:44',
                '2018-11-26 13:57:45', '2018-11-26 13:57:46',
                '2018-11-26 13:57:47', '2018-11-26 13:57:48',
                '2018-11-26 13:57:49', '2018-11-26 13:57:50',
                '2018-11-26 13:57:51', '2018-11-26 13:57:52',
                '2018-11-26 13:57:53', '2018-11-26 13:57:54',
                '2018-11-26 13:57:55', '2018-11-26 13:57:56',
                '2018-11-26 13:57:57', '2018-11-26 13:57:58'],
               dtype='datetime64[ns]', name='UTC Time (hh:mm:ss)', freq=None))

因此,我们看到pandas以某种方式设置了日期;我如何测试它默认为当前日期?

0 个答案:

没有答案