我是使用熊猫的新手,我面临一些在jupyter中从系列到数据帧进行格式化的问题。基本上我有一个具有这种结构的系列
0 {“省”:“巴黎”, “ city”:“ Paris”,“ countryCode”:“ FR”,“ floor”:null,“ country”: “ France”,“ route”:“ RUE MONGE”,“ extra”:null,“ coordinates”: [2.35242,48.84477],“ streetNumber”:“ 55”,“ locationType”:null, “ postalCode”:“ 75005”} 1 {“省”:null,“ city”:“巴黎”, “ countryCode”:“ FR”,“ floor”:“ CPO_BELI_floor_1482430978123”, “国家/地区”:“法国”,“路线”:“ PLACE DU PANTHEON”,“其他”:null, “坐标”:[2.345032,48.845715],“ streetNumber”:“ 17”, “ locationType”:“户外”,“ postalCode”:“ 75005”} 2 {“省”:null,“城市”:“巴黎”,“国家/地区代码”:“ FR”,“楼层”: “ CPO_BELI_floor_1482430978123”,“国家”:“法国”,“路线”:“ RUE DU BAC”,“额外”:null,“坐标”:[2.327753,48.857124], “ streetNumber”:“ 35”,“ locationType”:“ OUTDOOR”,“ postalCode”: “ 75007”}
我运行此代码是为了将其转换为数据帧,但id不会将序列分成正确的对应列:
pd.DataFrame(data['fields.geolocation'], index=data.index)
非常感谢您的帮助。
答案 0 :(得分:1)
您接近了,需要将每一行转换为list
:
df = pd.DataFrame(data['fields.geolocation'].values.tolist(), index=data.index)
示例:
a = [{"province": "Paris", "city": "Paris", "countryCode": "FR", "floor": 'null', "country": "France", "route": "RUE MONGE", "extra": 'null', "coordinates": [2.35242, 48.84477], "streetNumber": "55", "locationType": 'null', "postalCode": "75005"} ,
{"province": 'null', "city": "Paris", "countryCode": "FR", "floor": "CPO_BELI_floor_1482430978123", "country": "France", "route": "PLACE DU PANTHEON", "extra": 'null', "coordinates": [2.345032, 48.845715], "streetNumber": "17", "locationType": "OUTDOOR", "postalCode": "75005"} ,
{"province": 'null', "city": "Paris", "countryCode": "FR", "floor": "CPO_BELI_floor_1482430978123", "country": "France", "route": "RUE DU BAC", "extra": 'null', "coordinates": [2.327753, 48.857124], "streetNumber": "35", "locationType": "OUTDOOR", "postalCode": "75007"}]
s = pd.Series(a, index=[2,3,5])
print (s)
2 {'province': 'Paris', 'city': 'Paris', 'countr...
3 {'province': 'null', 'city': 'Paris', 'country...
5 {'province': 'null', 'city': 'Paris', 'country...
dtype: object
df = pd.DataFrame(s.values.tolist(), index=s.index)
print (df)
city coordinates country countryCode extra \
2 Paris [2.35242, 48.84477] France FR null
3 Paris [2.345032, 48.845715] France FR null
5 Paris [2.327753, 48.857124] France FR null
floor locationType postalCode province \
2 null null 75005 Paris
3 CPO_BELI_floor_1482430978123 OUTDOOR 75005 null
5 CPO_BELI_floor_1482430978123 OUTDOOR 75007 null
route streetNumber
2 RUE MONGE 55
3 PLACE DU PANTHEON 17
5 RUE DU BAC 35
答案 1 :(得分:0)
尝试将pd.concat
与axis=1
(link)结合使用:
这是您的系列:
A = {"province": "Paris", "city": "Paris", "countryCode": "FR", "floor": None, "country": "France", "route": "RUE MONGE", "extra": None, "coordinates": [2.35242, 48.84477], "streetNumber": "55", "locationType": None, "postalCode": "75005"}
B = {"province": None, "city": "Paris", "countryCode": "FR", "floor": "CPO_BELI_floor_1482430978123", "country": "France", "route": "PLACE DU PANTHEON", "extra": None, "coordinates": [2.345032, 48.845715], "streetNumber": "17", "locationType": "OUTDOOR", "postalCode": "75005"}
C = {"province": None, "city": "Paris", "countryCode": "FR", "floor": "CPO_BELI_floor_1482430978123", "country": "France", "route": "RUE DU BAC", "extra": None, "coordinates": [2.327753, 48.857124], "streetNumber": "35", "locationType": "OUTDOOR", "postalCode": "75007"}
A_series = pd.Series(A)
B_series = pd.Series(B)
C_series = pd.Series(C)
这样您就可以创建所需的数据框
df = pd.concat([A_series, B_series, C_series], axis=1)
type(df)
pandas.core.frame.DataFrame
希望这会有所帮助。