Question

我有以下csv，保存为test.txt：

title, arbitrarydate, value
hello, 01-Jan-01, 314159

运行以下代码

dataframe = pd.read_csv('pandatestcsv.txt', parse_dates = True)
print dataframe.dtypes

给出了这个输出

title            object
arbitrarydate    object
value             int64
dtype: object

为什么pandas无法检测到randomdate是日期列？我怎样才能正确解析它？我希望它检测到randomdate是我的日期列，我不想提前指定哪些列包含日期。

Answer 1

对我而言：

import pandas as pd
import io

temp=u"""title,arbitrarydate,value
hello,01-Jan-01,314159"""
#after testing replace io.StringIO(temp) to filename
df = pd.read_csv(io.StringIO(temp), parse_dates=['arbitrarydate'])
print (df)
   title arbitrarydate   value
0  hello    2001-01-01  314159

print (df.dtypes)
title                    object
arbitrarydate    datetime64[ns]
value                     int64
dtype: object

另一种解决方案是将列的位置添加为parse_dates的参数：

df = pd.read_csv(io.StringIO(temp), parse_dates=[1])
print (df)
   title arbitrarydate   value
0  hello    2001-01-01  314159

print (df.dtypes)
title                    object
arbitrarydate    datetime64[ns]
value                     int64
dtype: object

Docs

您可以在parse_dates中指定所有列，但这很危险，因为有时可以将某些整数解析为日期时间，例如：

import pandas as pd
import io

temp=u"""title,arbitrarydate,value
hello,01-Jan-01,2000"""
#after testing replace io.StringIO(temp) to filename
df = pd.read_csv(io.StringIO(temp), parse_dates = [0,1,2])
print (df)
   title arbitrarydate      value
0  hello    2001-01-01 2000-01-01

print (df.dtypes)
title                    object
arbitrarydate    datetime64[ns]
value            datetime64[ns]
dtype: object

如何从csv中检测pandas中的01-Jan-01日期

1 个答案: