如何使用Pandas中的json_normalize

时间:2017-07-28 23:26:10

标签: json python-3.x pandas

我正在尝试使用Pandas的json_normalize,但到目前为止,我的努力只产生了错误。谁能让我知道我做错了什么?我有一个复杂的嵌套JSON,我很想利用熊猫强大的工具来分析它。

代码(当前尝试):

import json, pandas as pd

from pandas.io.json import json_normalize

df = pd.read_json('dir/data.json')

json_normalize(df,'aaa', 'bbb')

错误的范围是

TypeError: string indices must be integers

多个KeyError: 0问题。

我尝试使用多个关键字参数来使用此函数,我尝试将数据分解为行并在规范化之前逐行重新创建它,然后我读取了documentation for this function并将该函数与该函数结合使用我得到的错误。一切都失败了。我怀疑这可能是由data.json的相当复杂的性质造成的。我可以使用其他方法,但它们非常耗时。

关于格式化的道歉,这是我的第一个问题。对于那些响应建设性反馈的人来说,这里有几行从我的数据文件中间取出:

{"_id" : { "$oid" : "52b213b38594d8a2be17c789" }, "approvalfy" : "2014", "board_approval_month" : "October", "boardapprovaldate" : "2013-10-29T00:00:00Z", "borrower" : "THE KINGDOM OF MOROCCO", "closingdate" : "2014-12-31T00:00:00Z", "country_namecode" : "Kingdom of Morocco!$!MA", "countrycode" : "MA", "countryname" : "Kingdom of Morocco", "countryshortname" : "Morocco", "docty" : "Program Document,Project Information Document,Project Information Document", "grantamt" : 0, "ibrdcommamt" : 200000000, "id" : "P130903", "idacommamt" : 0, "impagency" : "MINISTRY OF FINANCE", "lendinginstr" : "Development Policy Lending", "lendinginstrtype" : "AD", "lendprojectcost" : 200000000, "majorsector_percent" : [ { "Name" : "Public Administration, Law, and Justice", "Percent" : 34 }, { "Name" : "Public Administration, Law, and Justice", "Percent" : 33 }, { "Name" : "Public Administration, Law, and Justice", "Percent" : 33 } ], "mjsector_namecode" : [ { "name" : "Public Administration, Law, and Justice", "code" : "BX" }, { "name" : "Public Administration, Law, and Justice", "code" : "BX" }, { "name" : "Public Administration, Law, and Justice", "code" : "BX" } ], "mjtheme" : [ "Public sector governance", "Public sector governance", "Public sector governance" ], "mjtheme_namecode" : [ { "name" : "Public sector governance", "code" : "2" }, { "name" : "Public sector governance", "code" : "2" }, { "name" : "Public sector governance", "code" : "2" } ], "mjthemecode" : "2,2,2", "prodline" : "PE", "prodlinetext" : "IBRD/IDA", "productlinetype" : "L", "project_abstract" : { "cdata" : "The objective of this First Transparency and Accountability Development Policy Loan (DPL) Program for Morocco is to support the concretization of key new constitutional governance principles and rights, aimed at increasing transparency and accountability and enhancing citizen engagement and access to information. The series supports structural reforms strengthening economic governance across the public sector and new policies fostering more inclusive and open governance. The DPL has been prepared jointly with the European Union (EU) and the African Development Bank (AfDB), leveraging a further US$ 250 million in support of common key policy actions such as the budget, procurement and open governance reforms. The programmatic approach is warranted by the scope and depth of the government's governance reform program, the implementation of which will require time, assistance, and flexibility. This operation is complemented by the transition fund project supporting the implementation of Morocco's new governance framework. This US$ 4 million grant provides technical assistance for the implementation of structural reforms fostering public engagement; performance based budgeting and fiscal decentralization. The series adopts a holistic and integrated approach to enhance its impact. It is supporting governance reforms across the public sector covering the central government; State owned Enterprises, or SoEs and agencies, local governments as well as inter-governmental relations. The Bank has provided policy advice and technical assistance for the design of most policy measures and laws supported by this DPL, with the support from the MNA multi-donor trust fund. The transition fund governance project will support the implementation of these structural reforms. While building on the long-standing engagement with public administration reform, under the Public Administration Reform Loan (PARL) series, this program supports the concretization of the performance budgeting reform through the adoption and implementation of the new organic budget law and procurement decree. This DPL series also delves into new reform areas derived from the constitution such as access to information, public petitions, as well as into the governance of SoEs and local finances." }, "project_name" : "MA Accountability and Transparency DPL", "projectdocs" : [ { "DocTypeDesc" : "Program Document (PGD),  Vol.1 of 1", "DocType" : "PGD", "EntityID" : "000333037_20131009170139", "DocURL" : "http://www-wds.worldbank.org/servlet/WDSServlet?pcont=details&eid=000333037_20131009170139", "DocDate" : "30-SEP-2013" }, { "DocTypeDesc" : "Project Information Document (PID),  Vol.1 of 1", "DocType" : "PID", "EntityID" : "000231615_20121031105539", "DocURL" : "http://www-wds.worldbank.org/servlet/WDSServlet?pcont=details&eid=000231615_20121031105539", "DocDate" : "04-SEP-2012" }, { "DocTypeDesc" : "Project Information Document (PID),  Vol.1 of 1", "DocType" : "PID", "EntityID" : "000386194_20121016015521", "DocURL" : "http://www-wds.worldbank.org/servlet/WDSServlet?pcont=details&eid=000386194_20121016015521", "DocDate" : "04-SEP-2012" } ], "projectfinancialtype" : "IBRD", "projectstatusdisplay" : "Active", "regionname" : "Middle East and North Africa", "sector" : [ { "Name" : "General public administration sector" }, { "Name" : "Central government administration" }, { "Name" : "Public administration- Information and communications" } ], "sector1" : { "Name" : "General public administration sector", "Percent" : 34 }, "sector2" : { "Name" : "Central government administration", "Percent" : 33 }, "sector3" : { "Name" : "Public administration- Information and communications", "Percent" : 33 }, "sector_namecode" : [ { "name" : "General public administration sector", "code" : "BZ" }, { "name" : "Central government administration", "code" : "BC" }, { "name" : "Public administration- Information and communications", "code" : "BM" } ], "sectorcode" : "BM,BC,BZ", "source" : "IBRD", "status" : "Active", "supplementprojectflg" : "N", "theme1" : { "Name" : "Other accountability/anti-corruption", "Percent" : 33 }, "theme_namecode" : [ { "name" : "Other accountability/anti-corruption", "code" : "29" }, { "name" : "Other public sector governance", "code" : "30" }, { "name" : "Public expenditure, financial management and procurement", "code" : "27" } ], "themecode" : "27,30,29", "totalamt" : 200000000, "totalcommamt" : 200000000, "url" : "http://www.worldbank.org/projects/P130903?lang=en" }
{ "_id" : { "$oid" : "52b213b38594d8a2be17c78a" }, "approvalfy" : "2014", "board_approval_month" : "October", "boardapprovaldate" : "2013-10-25T00:00:00Z", "borrower" : "GOVERNMENT OF SOUTH SUDAN", "country_namecode" : "Republic of South Sudan!$!SS", "countrycode" : "SS", "countryname" : "Republic of South Sudan", "countryshortname" : "South Sudan", "docty" : "Project Paper,Project Information Document", "envassesmentcategorycode" : "B", "grantamt" : 7530000, "ibrdcommamt" : 0, "id" : "P145339", "idacommamt" : 0, "impagency" : "MINISTRY OF AGRICULTURE, COOPERATIVES AND RURAL DEVELOPMENT", "lendinginstr" : "Specific Investment Loan", "lendinginstrtype" : "IN", "lendprojectcost" : 7530000, "majorsector_percent" : [ { "Name" : "Agriculture, fishing, and forestry", "Percent" : 50 }, { "Name" : "Health and other social services", "Percent" : 30 }, { "Name" : "Agriculture, fishing, and forestry", "Percent" : 20 } ], "mjsector_namecode" : [ { "name" : "Agriculture, fishing, and forestry", "code" : "AX" }, { "name" : "Health and other social services", "code" : "JX" }, { "name" : "Agriculture, fishing, and forestry", "code" : "AX" } ], "mjtheme" : [ "Rural development" ], "mjtheme_namecode" : [ { "name" : "Rural development", "code" : "10" }, { "name" : "", "code" : "2" } ], "mjthemecode" : "10,2", "prodline" : "RE", "prodlinetext" : "Recipient Executed Activities", "productlinetype" : "L", "project_abstract" : { "cdata" : "The development objective of the Additional Financing (AF) for the Emergency Food Crisis Response Project for South Sudan is to support adoption of improved technologies for food production by eligible beneficiaries, increase storage capacity for staples, and provide cash or food to eligible people participating in public works programs in selected counties in South Sudan. This is the third AF to the project and will be primarily used to scale-up and augment benefits to already participating beneficiaries and to expand project activities to four additional counties where recent monitoring points to significantly deteriorating food security. The AF will cover the costs associated with: (i) provision of agricultural inputs, production technology, and advisory services; (ii) rehabilitating a seed processing facility to increase farmer's access to improved seed; (iii) bringing land that is currently out of production back into production; (iv) training farmers on reduction of postharvest losses; (v) building of food storage capacity to support postharvest handling at the household and community levels; and (vi) provision of cash or food for work to eligible individuals. The implementation schedule will be slightly revised and the closing date of both the original project and AF will be extended to April 30, 2015." }, "project_name" : "Southern Sudan Emergency Food Crisis Response Project- AF III", "projectdocs" : [ { "DocTypeDesc" : "Project Paper (PJPR),  Vol.1 of 1", "DocType" : "PJPR", "EntityID" : "000442464_20131009102446", "DocURL" : "http://www-wds.worldbank.org/servlet/WDSServlet?pcont=details&eid=000442464_20131009102446", "DocDate" : "01-OCT-2013" }, { "DocTypeDesc" : "Project Information Document (PID),  Vol.1 of 1", "DocType" : "PID", "EntityID" : "000001843_20130618091419", "DocURL" : "http://www-wds.worldbank.org/servlet/WDSServlet?pcont=details&eid=000001843_20130618091419", "DocDate" : "07-JUN-2013" } ], "projectfinancialtype" : "OTHER", "projectstatusdisplay" : "Active", "regionname" : "Africa", "sector" : [ { "Name" : "Crops" }, { "Name" : "Other social services" }, { "Name" : "General agriculture, fishing and forestry sector" } ], "sector1" : { "Name" : "Crops", "Percent" : 50 }, "sector2" : { "Name" : "Other social services", "Percent" : 30 }, "sector3" : { "Name" : "General agriculture, fishing and forestry sector", "Percent" : 20 }, "sector_namecode" : [ { "name" : "Crops", "code" : "AH" }, { "name" : "Other social services", "code" : "JB" }, { "name" : "General agriculture, fishing and forestry sector", "code" : "AZ" } ], "sectorcode" : "AZ,JB,AH", "source" : "IBRD", "status" : "Active", "supplementprojectflg" : "Y", "theme1" : { "Name" : "Global food crisis response", "Percent" : 100 }, "theme_namecode" : [ { "name" : "Global food crisis response", "code" : "91" } ], "themecode" : "91", "totalamt" : 0, "totalcommamt" : 7530000, "url" : "http://www.worldbank.org/projects/P145339?lang=en" }
 { "_id" : { "$oid" : "52b213b38594d8a2be17c78b" }, "approvalfy" : "2014", "board_approval_month" : "October", "boardapprovaldate" : "2013-10-25T00:00:00Z", "closingdate" : "2017-12-31T00:00:00Z", "country_namecode" : "Republic of India!$!IN", "countrycode" : "IN", "countryname" : "Republic of India", "countryshortname" : "India", "docty" : "Project Appraisal Document,Environmental Assessment,Project Information Document,Integrated Safeguards Data Sheet,Working Paper", "envassesmentcategorycode" : "B", "grantamt" : 0, "ibrdcommamt" : 0, "id" : "P146653", "idacommamt" : 250000000, "lendinginstr" : "Investment Project Financing", "lendinginstrtype" : "IN", "lendprojectcost" : 250000000, "majorsector_percent" : [ { "Name" : "Transportation", "Percent" : 60 }, { "Name" : "Water, sanitation and flood protection", "Percent" : 25 }, { "Name" : "Industry and trade", "Percent" : 10 }, { "Name" : "Health and other social services", "Percent" : 5 } ], "mjsector_namecode" : [ { "name" : "Transportation", "code" : "TX" }, { "name" : "Water, sanitation and flood protection", "code" : "WX" }, { "name" : "Industry and trade", "code" : "YX" }, { "name" : "Health and other social services", "code" : "JX" } ], "mjtheme" : [ "Rural development", "Social protection and risk management", "Social protection and risk management", "Environment and natural resources management" ], "mjtheme_namecode" : [ { "name" : "Rural development", "code" : "10" }, { "name" : "Social protection and risk management", "code" : "6" }, { "name" : "Social protection and risk management", "code" : "6" }, { "name" : "Environment and natural resources management", "code" : "11" } ], "mjthemecode" : "10,6,6,11", "prodline" : "PE", "prodlinetext" : "IBRD/IDA", "productlinetype" : "L", "project_abstract" : { "cdata" : "The objective of the Uttarakhand Disaster Recovery Project for India is to restore housing, rural connectivity and build resilience of communities in Uttarakhand and increase the technical capacity of the state entities to respond promptly and effectively to an eligible crisis or emergency. There are six components to the project, the first component being resilient infrastructure reconstruction. The objective of this component is to focus on the immediate needs of reconstruction of damaged houses and public buildings. The aim is to reduce the vulnerability of the affected population and restore access to the basic services of governance. The second component is the rural road connectivity. The objective of this component is to restore the connectivity lost due to the disaster through the reconstruction of damaged roads and bridges including: village roads, Other District Roads (ODRs), bridle roads and bridle bridges. The third component is the technical assistance and capacity building for disaster risk management. The objective of this component is to enhance the capabilities of government entities and others in risk mitigation and response. The fourth component is the financing disaster response expenses. This component will support the financing of eligible expenses already incurred by the state during the immediate post-disaster response period. The fifth component is the implementation support. This component will support the incremental operating costs of the project, including the operation of the Project Management Unit (PMU) and the respective Project Implementation Units (PIUs). Finally, the sixth component is the contingency emergency response." }, "project_name" : "Uttarakhand Disaster Recovery Project", "projectdocs" : [ { "DocTypeDesc" : "Project Appraisal Document (PAD),  Vol.1 of 1", "DocType" : "PAD", "EntityID" : "000333037_20131021112627", "DocURL" : "http://www-wds.worldbank.org/servlet/WDSServlet?pcont=details&eid=000333037_20131021112627", "DocDate" : "11-OCT-2013" }, { "DocTypeDesc" : "Environmental Assessment (EA),  Vol.1 of 1", "DocType" : "EA", "EntityID" : "000442464_20131015112514", "DocURL" : "http://www-wds.worldbank.org/servlet/WDSServlet?pcont=details&eid=000442464_20131015112514", "DocDate" : "10-OCT-2013" }, { "DocTypeDesc" : "Project Information Document (PID),  Vol.1 of 1", "DocType" : "PID", "EntityID" : "000356161_20130926131319", "DocURL" : "http://www-wds.worldbank.org/servlet/WDSServlet?pcont=details&eid=000356161_20130926131319", "DocDate" : "24-SEP-2013" }, { "DocTypeDesc" : "Integrated Safeguards Data Sheet (ISDS),  Vol.1 of 1", "DocType" : "ISDS", "EntityID" : "000333037_20130926120720", "DocURL" : "http://www-wds.worldbank.org/servlet/WDSServlet?pcont=details&eid=000333037_20130926120720", "DocDate" : "24-SEP-2013" }, { "DocTypeDesc" : "Working Paper (WP),  Vol.1 of 1", "DocType" : "WP", "EntityID" : "000333037_20131115110208", "DocURL" : "http://www-wds.worldbank.org/servlet/WDSServlet?pcont=details&eid=000333037_20131115110208", "DocDate" : "01-JUN-2013" } ], "projectfinancialtype" : "IDA", "projectstatusdisplay" : "Active", "regionname" : "South Asia", "sector" : [ { "Name" : "Rural and Inter-Urban Roads and Highways" }, { "Name" : "Flood protection" }, { "Name" : "Housing construction" }, { "Name" : "Other social services" } ], "sector1" : { "Name" : "Rural and Inter-Urban Roads and Highways", "Percent" : 60 }, "sector2" : { "Name" : "Flood protection", "Percent" : 25 }, "sector3" : { "Name" : "Housing construction", "Percent" : 10 }, "sector4" : { "Name" : "Other social services", "Percent" : 5 }, "sector_namecode" : [ { "name" : "Rural and Inter-Urban Roads and Highways", "code" : "TI" }, { "name" : "Flood protection", "code" : "WD" }, { "name" : "Housing construction", "code" : "YC" }, { "name" : "Other social services", "code" : "JB" } ], "sectorcode" : "JB,YC,WD,TI", "source" : "IBRD", "status" : "Active", "supplementprojectflg" : "N", "theme1" : { "Name" : "Rural services and infrastructure", "Percent" : 60 }, "theme_namecode" : [ { "name" : "Rural services and infrastructure", "code" : "78" }, { "name" : "Natural disaster management", "code" : "52" }, { "name" : "Social risk mitigation", "code" : "87" }, { "name" : "Climate change", "code" : "81" } ], "themecode" : "81,87,52,78", "totalamt" : 250000000, "totalcommamt" : 250000000, "url" : "http://www.worldbank.org/projects/P146653?lang=en" }

值得注意的是,并非每一行中的所有字段都有有效信息,我应该识别并纠正该问题。我不想要答案,我只想知道如何使用json_normalize将信息转换为Pandas数据帧。

1 个答案:

答案 0 :(得分:0)

这对我有用:

  1. 逐行读取数据作为字符串(复制将文本粘贴到文件中)

  2. 使用json函数将每个字符串转换为python dict。

  3. 使用pandas json_normalize将每个dict转换为一行DF,如果需要,将所有DF连接起来。

    import pandas as pd
    from pandas.io.json import json_normalize
    import json
    
    with open('data.json', 'r') as f: # 'data.json' is the name of the file
        data = f.readlines()
    
    pd.concat([json_normalize(json.loads(j)) for j in data])