我的数据如下:
{u'"57e01311817bc367c030b390"': u'{"ad_since": 2016, "indoor_swimming_pool": "No", "seaside": "No", "handicapped_access": "Yes"}', u'"57e01311817bc367c030b3a8"': u'{"ad_since": 2012, "indoor_swimming_pool": "No", "seaside": "No", "handicapped_access": "Yes"}'}
我想将其转换为pandas Dataframe。但是当我尝试
时df = pd.DataFrame(response.items())
我得到一个包含两列的DataFrame,第一列带有第一个键,第二列带有键值:
0 1
0 "57e01311817bc367c030b390" {"ad_since": 2016, "indoor_swimming_pool": "No...
1 "57e01311817bc367c030b3a8" {"ad_since": 2012, "indoor_swimming_pool": "No...
如何为每个密钥获取一列:"ad_since"
,"indoor_swimming_pool"
,"indoor_swimming_pool"
?并保留第一列,或将id作为索引。
答案 0 :(得分:2)
您需要按type
或str
将dict
.apply(literal_eval)
列转换为.apply(json.loads)
,然后使用DataFrame.from_records
:
import pandas as pd
from ast import literal_eval
response = {u'"57e01311817bc367c030b390"': u'{"ad_since": 2016, "indoor_swimming_pool": "No", "seaside": "No", "handicapped_access": "Yes"}',
u'"57e01311817bc367c030b3a8"': u'{"ad_since": 2012, "indoor_swimming_pool": "No", "seaside": "No", "handicapped_access": "Yes"}'}
df = pd.DataFrame.from_dict(response, orient='index')
print (type(df.iloc[0,0]))
<class 'str'>
df.iloc[:,0] = df.iloc[:,0].apply(literal_eval)
print (pd.DataFrame.from_records(df.iloc[:,0].values.tolist(), index=df.index))
ad_since handicapped_access indoor_swimming_pool \
"57e01311817bc367c030b3a8" 2012 Yes No
"57e01311817bc367c030b390" 2016 Yes No
seaside
"57e01311817bc367c030b3a8" No
"57e01311817bc367c030b390" No
import pandas as pd
import json
response = {u'"57e01311817bc367c030b390"': u'{"ad_since": 2016, "indoor_swimming_pool": "No", "seaside": "No", "handicapped_access": "Yes"}',
u'"57e01311817bc367c030b3a8"': u'{"ad_since": 2012, "indoor_swimming_pool": "No", "seaside": "No", "handicapped_access": "Yes"}'}
df = pd.DataFrame.from_dict(response, orient='index')
df.iloc[:,0] = df.iloc[:,0].apply(json.loads)
print (pd.DataFrame.from_records(df.iloc[:,0].values.tolist(), index=df.index))
ad_since handicapped_access indoor_swimming_pool \
"57e01311817bc367c030b3a8" 2012 Yes No
"57e01311817bc367c030b390" 2016 Yes No
seaside
"57e01311817bc367c030b3a8" No
"57e01311817bc367c030b390" No
答案 1 :(得分:1)
由于值是字符串,您可以使用json
module和列表理解:
In [20]: d = {u'"57e01311817bc367c030b390"': u'{"ad_since": 2016, "indoor_swimming_pool": "No", "seaside": "No", "handicapped_access": "Yes"}', u'"57e01311817bc367c030b3a8"': u'{"ad_since": 2012, "indoor_swimming_pool": "No", "seaside": "No", "handicapped_access": "Yes"}'}
In [21]: import json
In [22]: pd.DataFrame(dict([(k, [json.loads(e)[k] for e in d.values()]) for k in json.loads(d.values()[0])]), index=d.keys())Out[22]:
ad_since handicapped_access indoor_swimming_pool \
"57e01311817bc367c030b390" 2016 Yes No
"57e01311817bc367c030b3a8" 2012 Yes No
seaside
"57e01311817bc367c030b390" No
"57e01311817bc367c030b3a8" No