从JSON生成数据框架

时间:2016-06-16 11:05:53

标签: python json string pandas dataframe

我正在尝试从JSON生成数据框。我所拥有的json格式为

    {
  eventId: "9668383e-ec96-4d6a-b873-2312dd008e7b",
  eventType: "PlannedCustomerChoiceWasUpdated",
  publishedDate: "2016-05-31T18:52:29.219Z",
  payload: {
    plannedCustomerChoiceId: "e9301a6e-7ccf-4c89-bd05-19c1b9067a61"
  },
  _links: {
    self: {
      href: "http://gbp-router.gapinc.dev:8080/planning-service/_feeds/planning.planning-service.planned-customer-choice-events/entries/d2de62a6-1e0f-430a-bf3f-2df711f64beb"
    },
    source: {
      href: "http://gbp-router.gapinc.dev:8080/planning-service/planning/buy-plan/planned-customer-choices/e9301a6e-7ccf-4c89-bd05-19c1b9067a61"
    }
  }
}

我需要将列作为单个记录。

这是我迄今为止所做的。

from pandas.io.json import json_normalize
a = { "eventId": "9668383e-ec96-4d6a-b873-2312dd008e7b", "eventType": "PlannedCustomerChoiceWasUpdated", "publishedDate": "2016-05-31T18:52:29.219Z", "payload": { "plannedCustomerChoiceId": "e9301a6e-7ccf-4c89-bd05-19c1b9067a61" }, "_links": { "self": { "href": "http://gbp-router.gapinc.dev:8080/planning-service/_feeds/planning.planning-service.planned-customer-choice-events/entries/d2de62a6-1e0f-430a-bf3f-2df711f64beb" }, "source": { "href": "http://gbp-router.gapinc.dev:8080/planning-service/planning/buy-plan/planned-customer-choices/e9301a6e-7ccf-4c89-bd05-19c1b9067a61" } } }
b = json_normalize(a)
print b

我无法达到理想的格式。任何人都可以帮助我。

使用b = pd.DataFrame(a)数据框采用以下格式。

_

links  \
plannedCustomerChoiceId                                                NaN   
self                     {u'href': u'http://gbp-router.gapinc.dev:8080/...   
source                   {u'href': u'http://gbp-router.gapinc.dev:8080/...   

                                                      eventId  \
plannedCustomerChoiceId  9668383e-ec96-4d6a-b873-2312dd008e7b   
self                     9668383e-ec96-4d6a-b873-2312dd008e7b   
source                   9668383e-ec96-4d6a-b873-2312dd008e7b   

                                               eventType  \
plannedCustomerChoiceId  PlannedCustomerChoiceWasUpdated   
self                     PlannedCustomerChoiceWasUpdated   
source                   PlannedCustomerChoiceWasUpdated   

                                                      payload  \
plannedCustomerChoiceId  e9301a6e-7ccf-4c89-bd05-19c1b9067a61   
self                                                      NaN   
source                                                    NaN   

                                    publishedDate  
plannedCustomerChoiceId  2016-05-31T18:52:29.219Z  
self                     2016-05-31T18:52:29.219Z  
source                   2016-05-31T18:52:29.219Z 

我真正想要的是

9668383e-ec96-4d6a-b873-2312dd008e7b,PlannedCustomerChoiceWasUpdated,2016-05-31T18:52:29.219Z,e9301a6e-7ccf-4c89-bd05-19c1b9067a61,http://gbp-router.gapinc.dev:8080/planning-service/planning/buy-plan/planned-customer-choices/e9301a6e-7ccf-4c89-bd05-19c1b9067a61

1 个答案:

答案 0 :(得分:1)

我认为您可以先更改列的排序,然后to_csv

import pandas as pd

from pandas.io.json import json_normalize
a = { "eventId": "9668383e-ec96-4d6a-b873-2312dd008e7b", "eventType": "PlannedCustomerChoiceWasUpdated", "publishedDate": "2016-05-31T18:52:29.219Z", "payload": { "plannedCustomerChoiceId": "e9301a6e-7ccf-4c89-bd05-19c1b9067a61" }, "_links": { "self": { "href": "http://gbp-router.gapinc.dev:8080/planning-service/_feeds/planning.planning-service.planned-customer-choice-events/entries/d2de62a6-1e0f-430a-bf3f-2df711f64beb" }, "source": { "href": "http://gbp-router.gapinc.dev:8080/planning-service/planning/buy-plan/planned-customer-choices/e9301a6e-7ccf-4c89-bd05-19c1b9067a61" } } }
b = json_normalize(a)

b = b[['eventId','eventType','publishedDate','payload.plannedCustomerChoiceId','_links.source.href']]
#print (b)

print (b.to_csv(index=False, header=False))
9668383e-ec96-4d6a-b873-2312dd008e7b,PlannedCustomerChoiceWasUpdated,2016-05-31T18:52:29.219Z,e9301a6e-7ccf-4c89-bd05-19c1b9067a61,http://gbp-router.gapinc.dev:8080/planning-service/planning/buy-plan/planned-customer-choices/e9301a6e-7ccf-4c89-bd05-19c1b9067a61

如果需要更改列名:

import pandas as pd

from pandas.io.json import json_normalize
a = { "eventId": "9668383e-ec96-4d6a-b873-2312dd008e7b", "eventType": "PlannedCustomerChoiceWasUpdated", "publishedDate": "2016-05-31T18:52:29.219Z", "payload": { "plannedCustomerChoiceId": "e9301a6e-7ccf-4c89-bd05-19c1b9067a61" }, "_links": { "self": { "href": "http://gbp-router.gapinc.dev:8080/planning-service/_feeds/planning.planning-service.planned-customer-choice-events/entries/d2de62a6-1e0f-430a-bf3f-2df711f64beb" }, "source": { "href": "http://gbp-router.gapinc.dev:8080/planning-service/planning/buy-plan/planned-customer-choices/e9301a6e-7ccf-4c89-bd05-19c1b9067a61" } } }
b = json_normalize(a)

b.columns = ['self','source','eventId','eventType','plannedCustomerChoiceId','publishedDate']
print (b)
                                                self  \
0  http://gbp-router.gapinc.dev:8080/planning-ser...   

                                              source  \
0  http://gbp-router.gapinc.dev:8080/planning-ser...   

                                eventId                        eventType  \
0  9668383e-ec96-4d6a-b873-2312dd008e7b  PlannedCustomerChoiceWasUpdated   

                plannedCustomerChoiceId             publishedDate  
0  e9301a6e-7ccf-4c89-bd05-19c1b9067a61  2016-05-31T18:52:29.219Z  

b = b[['eventId','eventType','publishedDate','plannedCustomerChoiceId','source']]
print (b)
                                eventId                        eventType  \
0  9668383e-ec96-4d6a-b873-2312dd008e7b  PlannedCustomerChoiceWasUpdated   

              publishedDate               plannedCustomerChoiceId  \
0  2016-05-31T18:52:29.219Z  e9301a6e-7ccf-4c89-bd05-19c1b9067a61   

                                              source  
0  http://gbp-router.gapinc.dev:8080/planning-ser...  

print (b.to_csv(index=False, header=False))
b873-2312dd008e7b,PlannedCustomerChoiceWasUpdated,2016-05-31T18:52:29.219Z,e9301a6e-7ccf-4c89-bd05-19c1b9067a61,http://gbp-router.gapinc.dev:8080/planning-service/planning/buy-plan/planned-customer-choices/e9301a6e-7ccf-4c89-bd05-19c1b9067a61