我有一个6列的text.csv文件。我希望将2列作为日期读入,以供以后使用。但是,我只有一个列作为日期时间返回。有任何想法吗?
此外,我还有几个空日期,它们返回 nan NOT 0(零),如 na_values = 0 ??
import pandas as pd
CSV = 'text.csv'
df = pd.read_csv(CSV,
skiprows = 0,
na_values = 0,
parse_dates = ['Date of Sign Up', 'Birth Date'],
usecols = ['Date of Sign Up', 'A', 'B', 'C', 'D', 'Birth Date'])
df.info() # Check info for column types and nan...
RangeIndex: 969 entries, 0 to 968
Data columns (total 6 columns):
Date of Sign Up 969 non-null datetime64[ns]
A 969 non-null object
B 969 non-null object
C 969 non-null object
D 969 non-null object
Birth Date 969 non-null object ## <== Why doesn't this column read as datetime?
dtypes: datetime64[ns](1), object(5)
memory usage: 45.5+ KB
答案 0 :(得分:1)
存在一个问题,Birth Date
中的某些值包含至少一个不可解析的日期时间,因此read_csv
默默地不解析列。
您可以通过以下方式检查此值:
dates = pd.to_datetime(df['Birth Date'], errors='coerce')
print (df.loc[dates.isnull(), 'Birth Date'])
另一种解决方案是将此有问题的值解析为NaT
:
df['Birth Date'] = pd.to_datetime(df['Birth Date'], errors='coerce')
我尝试测试0
是否正确解析为NaT
:
import pandas as pd
temp=u"""Date,a
2017-04-03,0
2017-04-04,1
0,2
2017-04-06,3
2017-04-07,4
2017-04-08,5"""
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
df = pd.read_csv(pd.compat.StringIO(temp), na_values = 0, parse_dates=['Date'])
print (df)
Date a
0 2017-04-03 NaN
1 2017-04-04 1.0
2 NaT 2.0
3 2017-04-06 3.0
4 2017-04-07 4.0
5 2017-04-08 5.0
print (df.dtypes)
Date datetime64[ns]
a float64
dtype: object
如果有一些不可解析的值:
import pandas as pd
temp=u"""Date,a
2017-04-03,0
string,1
0,2
2017-04-06,3
2017-04-07,4
2017-04-08,5"""
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
df = pd.read_csv(pd.compat.StringIO(temp), na_values = [0, 'string'], parse_dates=['Date'])
print (df)
Date a
0 2017-04-03 NaN
1 NaT 1.0
2 NaT 2.0
3 2017-04-06 3.0
4 2017-04-07 4.0
5 2017-04-08 5.0
print (df.dtypes)
Date datetime64[ns]
a float64
dtype: object