我在美国的宾夕法尼亚州刮了一个选举网站,这是该网站json中嵌套的字典的示例:
some_dict = {'Election': {'Statewide': [{'ADAMS': [{'CandidateName': 'BIDEN, JOSEPH '
'ROBINETTE JR',
'CountyName': 'ADAMS',
'ElectionDayNoVotes': '0',
'ElectionDayVotes': '1',
'ElectionDayYesVotes': '0',
'ElectionYear': '2020'
},
{'CandidateName': 'TRUMP, DONALD J. ',
'CountyName': 'ADAMS',
'ElectionDayNoVotes': '0',
'ElectionDayVotes': '1',
'ElectionDayYesVotes': '0',
'ElectionYear': '2020'
}],
'ALLEGHENY': [{'CandidateName': 'BIDEN, JOSEPH '
'ROBINETTE JR',
'CountyName': 'ALLEGHENY',
'ElectionDayNoVotes': '0',
'ElectionDayVotes': '1',
'ElectionDayYesVotes': '0',
'ElectionYear': '2020'
},
{'CandidateName': 'TRUMP, DONALD '
'J. ',
'CountyName': 'ALLEGHENY',
'ElectionDayNoVotes': '0',
'ElectionDayVotes': '1',
'ElectionDayYesVotes': '0',
'ElectionYear': '2020'
}]}]}}
我不知道如何将其转换为如下所示的数据框:
答案 0 :(得分:1)
import pandas as pd
some_dict = {'Election': {'Statewide': [{'ADAMS': [{'CandidateName': 'BIDEN, JOSEPH '
'ROBINETTE JR',
'CountyName': 'ADAMS',
'ElectionDayNoVotes': '0',
'ElectionDayVotes': '1',
'ElectionDayYesVotes': '0',
'ElectionYear': '2020'
},
{'CandidateName': 'TRUMP, DONALD J. ',
'CountyName': 'ADAMS',
'ElectionDayNoVotes': '0',
'ElectionDayVotes': '1',
'ElectionDayYesVotes': '0',
'ElectionYear': '2020'
}],
'ALLEGHENY': [{'CandidateName': 'BIDEN, JOSEPH '
'ROBINETTE JR',
'CountyName': 'ALLEGHENY',
'ElectionDayNoVotes': '0',
'ElectionDayVotes': '1',
'ElectionDayYesVotes': '0',
'ElectionYear': '2020'
},
{'CandidateName': 'TRUMP, DONALD '
'J. ',
'CountyName': 'ALLEGHENY',
'ElectionDayNoVotes': '0',
'ElectionDayVotes': '1',
'ElectionDayYesVotes': '0',
'ElectionYear': '2020'
}]}]}}
df = pd.DataFrame()
for d in some_dict['Election']['Statewide']:
for k,v in d.items():
t = pd.DataFrame(v)
t['CountyName'] = k
df = pd.concat([df,t])
答案 1 :(得分:0)
您可以通过以下两种方法之一进行操作:
pd.read_json()
pd.DataFrame()
.DataFrame()
方法接受
一个 single dict
键是列名,值是列值。
一个 list of dicts
每个列表项都是数据框的一行,用dict
表示:键是该特定行的列名和值。
在这里,我们正在使用list of dicts
方法来创建数据框。首先,我们使用自定义函数list of dicts
将数据转换为prepare_records()
,然后应用以下两种方法之一。
# prepare records
records = prepare_records(data)
# Method-1: using read_json()
import json
df = pd.read_json(json.dumps(records), orient='records')
# Method-2: using DataFrame()
df = pd.DataFrame(data=records)
输出:
# print(df.to_markdown(index=False))
| CandidateName | CountyName | ElectionDayNoVotes | ElectionDayVotes | ElectionDayYesVotes | ElectionYear |
|:---------------------------|:-------------|---------------------:|-------------------:|----------------------:|---------------:|
| BIDEN, JOSEPH ROBINETTE JR | ADAMS | 0 | 1 | 0 | 2020 |
| TRUMP, DONALD J. | ADAMS | 0 | 1 | 0 | 2020 |
| BIDEN, JOSEPH ROBINETTE JR | ALLEGHENY | 0 | 1 | 0 | 2020 |
| TRUMP, DONALD J. | ALLEGHENY | 0 | 1 | 0 | 2020 |
# custom function
def prepare_records(data):
records = []
for county in data['Election']['Statewide'][0].values():
records.extend(county) # same as: records += county
return records
data = {
'Election':
{'Statewide': [
{
'ADAMS': [
{
'CandidateName': 'BIDEN, JOSEPH ROBINETTE JR',
'CountyName': 'ADAMS',
'ElectionDayNoVotes': '0',
'ElectionDayVotes': '1',
'ElectionDayYesVotes': '0',
'ElectionYear': '2020'
},
{
'CandidateName': 'TRUMP, DONALD J.',
'CountyName': 'ADAMS',
'ElectionDayNoVotes': '0',
'ElectionDayVotes': '1',
'ElectionDayYesVotes': '0',
'ElectionYear': '2020'
},
],
'ALLEGHENY': [
{
'CandidateName': 'BIDEN, JOSEPH ROBINETTE JR',
'CountyName': 'ALLEGHENY',
'ElectionDayNoVotes': '0',
'ElectionDayVotes': '1',
'ElectionDayYesVotes': '0',
'ElectionYear': '2020'
},
{
'CandidateName': 'TRUMP, DONALD J.',
'CountyName': 'ALLEGHENY',
'ElectionDayNoVotes': '0',
'ElectionDayVotes': '1',
'ElectionDayYesVotes': '0',
'ElectionYear': '2020'
},
],
},
],
}
}