我正在尝试在我的两个列上设置dtype,但它无效。我想将[trans_typ]设置为'category',将[date]设置为date.time。还有一个我已设置为date.time的索引[日期],但我想将第一列设置为date.time。
import numpy as np
import pandas as pd
import glob
df = pd.read_csv('/home/jayaramdas/anaconda3/cf_data', low_memory=False, \
parse_dates = True)
df.set_index(pd.to_datetime(df['date']), inplace=True)
df['trans_typ'].astype('category')
pd.to_datetime(df['date'])
df.dtypes
My output
date object
cmte_id object
trans_typ object
amount float64
fec_id object
cand_id object
dtype: object
这是我从print(df)输出的数据
date cmte_id trans_typ amount fec_id cand_id
date
2007-08-15 2007-08-15 C00112250 24K 2000 C00431569 P00003392
2007-09-26 2007-09-26 C00119040 24K 1000 C00367680 H2FL05127
2007-09-26 2007-09-26 C00119040 24K 1000 C00140715 H2MD05155
2007-07-20 2007-07-20 C00346296 24K 1000 C00434571 H8CA37137
答案 0 :(得分:1)
您可以使用:
#if you need copy of column date to index
df.set_index(df['date'], inplace=True)
print df
date cmte_id trans_typ entity_typ state employer \
date
2007-08-15 2007-08-15 C00112250 24K ORG DC NaN
2007-09-26 2007-09-26 C00119040 24K CCM FL NaN
2007-09-26 2007-09-26 C00119040 24K CCM MD NaN
2011-02-25 2011-02-25 C00478404 24K COM MN NaN
2011-02-01 2011-02-01 C00140855 24K CCM DC NaN
2011-02-01 2011-02-01 C00140855 24K CCM DC NaN
2011-02-22 2011-02-22 C00140855 24K CCM MD NaN
2011-02-28 2011-02-28 C00093963 24K CCM ND NaN
occupation amount fec_id cand_id
date
2007-08-15 NaN 2000 C00431569 P00003392
2007-09-26 NaN 1000 C00367680 H2FL05127
2007-09-26 NaN 1000 C00140715 H2MD05155
2011-02-25 NaN 2400 C00326629 H8MN06047
2011-02-01 NaN 1000 C00373464 H2OH17109
2011-02-01 NaN 1000 C00289983 H4KY01040
2011-02-22 NaN 2500 C00140715 H2MD05155
2011-02-28 NaN 1000 C00474619 H0ND00135
#convert column trans_typ to category
#column date is datetime, no converted
df['trans_typ'] = df['trans_typ'].astype('category')
print df
date cmte_id trans_typ entity_typ state employer \
date
2007-08-15 2007-08-15 C00112250 24K ORG DC NaN
2007-09-26 2007-09-26 C00119040 24K CCM FL NaN
2007-09-26 2007-09-26 C00119040 24K CCM MD NaN
2011-02-25 2011-02-25 C00478404 24K COM MN NaN
2011-02-01 2011-02-01 C00140855 24K CCM DC NaN
2011-02-01 2011-02-01 C00140855 24K CCM DC NaN
2011-02-22 2011-02-22 C00140855 24K CCM MD NaN
2011-02-28 2011-02-28 C00093963 24K CCM ND NaN
occupation amount fec_id cand_id
date
2007-08-15 NaN 2000 C00431569 P00003392
2007-09-26 NaN 1000 C00367680 H2FL05127
2007-09-26 NaN 1000 C00140715 H2MD05155
2011-02-25 NaN 2400 C00326629 H8MN06047
2011-02-01 NaN 1000 C00373464 H2OH17109
2011-02-01 NaN 1000 C00289983 H4KY01040
2011-02-22 NaN 2500 C00140715 H2MD05155
2011-02-28 NaN 1000 C00474619 H0ND00135
print df.dtypes
date datetime64[ns]
cmte_id object
trans_typ category
entity_typ object
state object
employer float64
occupation float64
amount int64
fec_id object
cand_id object
dtype: object
或者:
#if you DONT need copy of column date to index
df.set_index('date', inplace=True)
print df
cmte_id trans_typ entity_typ state employer occupation \
date
2007-08-15 C00112250 24K ORG DC NaN NaN
2007-09-26 C00119040 24K CCM FL NaN NaN
2007-09-26 C00119040 24K CCM MD NaN NaN
2011-02-25 C00478404 24K COM MN NaN NaN
2011-02-01 C00140855 24K CCM DC NaN NaN
2011-02-01 C00140855 24K CCM DC NaN NaN
2011-02-22 C00140855 24K CCM MD NaN NaN
2011-02-28 C00093963 24K CCM ND NaN NaN
amount fec_id cand_id
date
2007-08-15 2000 C00431569 P00003392
2007-09-26 1000 C00367680 H2FL05127
2007-09-26 1000 C00140715 H2MD05155
2011-02-25 2400 C00326629 H8MN06047
2011-02-01 1000 C00373464 H2OH17109
2011-02-01 1000 C00289983 H4KY01040
2011-02-22 2500 C00140715 H2MD05155
2011-02-28 1000 C00474619 H0ND00135
df['trans_typ'] = df['trans_typ'].astype('category')
print df
cmte_id trans_typ entity_typ state employer occupation \
date
2007-08-15 C00112250 24K ORG DC NaN NaN
2007-09-26 C00119040 24K CCM FL NaN NaN
2007-09-26 C00119040 24K CCM MD NaN NaN
2011-02-25 C00478404 24K COM MN NaN NaN
2011-02-01 C00140855 24K CCM DC NaN NaN
2011-02-01 C00140855 24K CCM DC NaN NaN
2011-02-22 C00140855 24K CCM MD NaN NaN
2011-02-28 C00093963 24K CCM ND NaN NaN
amount fec_id cand_id
date
2007-08-15 2000 C00431569 P00003392
2007-09-26 1000 C00367680 H2FL05127
2007-09-26 1000 C00140715 H2MD05155
2011-02-25 2400 C00326629 H8MN06047
2011-02-01 1000 C00373464 H2OH17109
2011-02-01 1000 C00289983 H4KY01040
2011-02-22 2500 C00140715 H2MD05155
2011-02-28 1000 C00474619 H0ND00135
print df.dtypes
cmte_id object
trans_typ category
entity_typ object
state object
employer float64
occupation float64
amount int64
fec_id object
cand_id object
dtype: object
print df.index
DatetimeIndex(['2007-08-15', '2007-09-26', '2007-09-26', '2011-02-25',
'2011-02-01', '2011-02-01', '2011-02-22', '2011-02-28'],
dtype='datetime64[ns]', name=u'date', freq=None)
答案 1 :(得分:0)
我刚刚使用df['date'] = df['date'].astype('datetime64')
并且它有效!