import pandas as pd
inp= [{'c1null':10,'cols':{'c2':20,'c3time':null}, 'c4':'41'}, {'c1null':11,'cols':{'c2':null,'c3time':'2014-05-24 19:20'},'c4':'42'}, {'c1null':12,'cols':{'c2':20,'c3time':'2016-06-24 19:20'},'c4':'43'}]
df=pd.io.json.json_normalize(inp)
print(df)
null
JSON字符串中有一个值inp
,因此上述脚本无法成功进行json_normalize以获得预期结果,如下所示:
c1null c4 cols.c2 cols.c3time
0 10 41 20 NaT
1 11 42 NaN 2014-05-24 19:20
2 12 43 20 2016-06-24 19:20
现在,我使用pd.read_sql
来获取数据帧,需要在名为null
的键时将值NaN
替换为NaT
或*time
,然后我们可以使用pd.io.json.json_normalize
。
如何将数据帧JSON字符串列中的值null
替换为NaN
或NaT
?
答案 0 :(得分:0)
尝试添加
from numpy import nan as null
inp= [{'c1':10,'cols':{'c2':20,'c3time':null}, 'c4':'41'}, {'c1':11,'cols':{'c2':null,'c3time':'2014-05-24 19:20'},'c4':'42'}, {'c1':12,'cols':{'c2':20,'c3time':'2016-06-24 19:20'},'c4':'43'}]
df=pd.io.json.json_normalize(inp)
df
Out[494]:
c1 c4 cols.c2 cols.c3time
0 10 41 20.0 NaN
1 11 42 NaN 2014-05-24 19:20
2 12 43 20.0 2016-06-24 19:20
df['cols.c3time']=pd.to_datetime(df['cols.c3time'])
df
Out[497]:
c1 c4 cols.c2 cols.c3time
0 10 41 20.0 NaT
1 11 42 NaN 2014-05-24 19:20:00
2 12 43 20.0 2016-06-24 19:20:00