我正在努力解析熊猫的约会时间。这是我的简短例子:
df.iloc[:10,10:]
Out[45]:
response_date revision scheduleClosedAt scheduleEventIndex scheduleId scheduleOpenedAt
0 {u'$date': u'2012-01-10T11:00:00.000+0000'} {u'Measure': 1} NaN NaN NaN NaN
1 {u'$date': u'2012-01-19T13:00:00.000+0000'} {u'Measure': 1} NaN NaN NaN NaN
2 {u'$date': u'2011-06-15T09:00:00.000+0100'} {u'Measure': 1} NaN NaN NaN NaN
3 {u'$date': u'2011-06-22T00:00:00.000+0100'} {u'Measure': 1} NaN NaN NaN NaN
4 {u'$date': u'2011-06-30T09:00:00.000+0100'} {u'Measure': 1} NaN NaN NaN NaN
5 {u'$date': u'2011-07-05T00:00:00.000+0100'} {u'Measure': 1} NaN NaN NaN NaN
6 {u'$date': u'2011-07-14T10:00:00.000+0100'} {u'Measure': 1} NaN NaN NaN NaN
7 {u'$date': u'2011-07-20T09:00:00.000+0100'} {u'Measure': 1} NaN NaN NaN NaN
8 {u'$date': u'2011-07-26T00:00:00.000+0100'} {u'Measure': 1} NaN NaN NaN NaN
9 {u'$date': u'2011-08-02T00:00:00.000+0100'} {u'Measure': 1} NaN NaN NaN NaN
我需要摆脱嵌套列'response_date'并将其转换为正常时间日期,同时保持列名'response_date'/
我试过了:
>> df_respons = df.response_date.apply(pd.Series)
>> df_new_response = pd.to_datetime(df_respons)
但得到了错误:
ValueError: to assemble mappings requires at least that [year, month, day] be specified: [day,month,year] is missing
将嵌套日期时间处理成漂亮的列的任何巧妙方法都可用吗?
修改
如何忽略缺失值?
43025 {u'$date': u'2015-11-18T10:35:00.000+0000'}
43026 {u'$date': u'2015-11-18T14:23:00.000+0000'}
43027 {u'$date': u'2015-11-18T14:23:00.000+0000'}
43028 {u'$date': u'2015-11-18T15:20:00.000+0000'}
43029 {u'$date': u'2015-11-18T15:20:00.000+0000'}
43030 NaN
43031 NaN
43032 {u'$date': u'2015-11-19T08:00:00.000+0000'}
43033 {u'$date': u'2015-11-19T08:00:00.000+0000'}
43034 {u'$date': u'2015-11-24T08:00:00.000+0000'}
给出一个新的'0'列:
0 response_date
43027 NaN 2015-11-18T14:23:00.000+0000
43028 NaN 2015-11-18T15:20:00.000+0000
43029 NaN 2015-11-18T15:20:00.000+0000
43030 NaN NaN
43031 NaN NaN
43032 NaN 2015-11-19T08:00:00.000+0000
43033 NaN 2015-11-19T08:00:00.000+0000
43034 NaN 2015-11-24T08:00:00.000+0000
答案 0 :(得分:1)
听起来你想要像df.apply(lambda row: pd.to_datetime(row['response_date']['$date']), axis=1)
;
In [41]: df
Out[41]:
response_date
0 {'$date': '2011-06-15T09:00:00.000+0100'}
In [42]: df['response_date'] = df.apply(lambda row: pd.to_datetime(row['response_date']['$date']), axis=1)
In [43]: df
Out[43]:
response_date
0 2011-06-15 08:00:00
答案 1 :(得分:1)
试试这个:
In [70]: pd.to_datetime(
df.response_date.map(lambda x:
x['$date'] if isinstance(x, dict) and '$date' in x
else x),
errors='coerce')
Out[70]:
0 2012-01-10 11:00:00
1 2012-01-19 13:00:00
2 2011-06-15 08:00:00
3 2011-06-21 23:00:00
4 2011-06-30 08:00:00
5 NaT
6 NaT
7 2011-07-20 08:00:00
8 2011-07-25 23:00:00
9 2011-08-01 23:00:00
Name: response_date, dtype: datetime64[ns]
答案 2 :(得分:1)
您可以使用combine_first
或fillna
替换NaN
以清空dict
,然后可以使用DataFrame
构造函数与values
进行转换到numpy array
然后tolist
:
d = {'$date':'response_date'}
s = pd.Series([{}], index=df.index)
df = pd.DataFrame(df['0'].combine_first(s).values.tolist()).rename(columns=d)
#alternatively
#df = pd.DataFrame(df['0'].fillna(s).values.tolist()).rename(columns=d)
df['response_date'] = pd.to_datetime(df['response_date'])
print (df)
response_date
0 2015-11-18 10:35:00
1 2015-11-18 14:23:00
2 2015-11-18 14:23:00
3 2015-11-18 15:20:00
4 2015-11-18 15:20:00
5 NaT
6 NaT
7 2015-11-19 08:00:00
8 2015-11-19 08:00:00
9 2015-11-24 08:00:00
map
的另一个解决方案:
df['response_date'] = \
pd.to_datetime(df['response_date'].map(lambda x: x['$date'] if type(x) == dict else x))
print (df)
response_date
43025 2015-11-18 10:35:00
43026 2015-11-18 14:23:00
43027 2015-11-18 14:23:00
43028 2015-11-18 15:20:00
43029 2015-11-18 15:20:00
43030 NaT
43031 NaT
43032 2015-11-19 08:00:00
43033 2015-11-19 08:00:00
43034 2015-11-24 08:00:00