尝试以特定格式准备此数据
import pandas as pd
voting = pd.read_json("GE2000.json")
voting.set_index(['county_fips','candidate_name','pty','vote_pct'],inplace=True)
print(voting)
然后返回
vote
county_fips candidate_name pty vote_pct
2000 Howard Phillips CS 0 596
John Hagelin NL 0 919
Harry Browne LB 1 2636
George W. Bush R 59 167398
Al Gore D 28 79004
1001 Howard Phillips I 0 9
John Hagelin I 0 5
Harry Browne LB 0 51
George W. Bush R 70 11993
Al Gore D 29 4942
在此之后,我想对vote_pct进行排序并抓住最大的,就像这样(我已经尝试过sort_values,sort_index等,并且不能让它产生所需的输出)
vote
county_fips candidate_name pty vote_pct
2000 George W. Bush R 59 167398
1001 George W. Bush R 70 11993
这是示例数据
[
{
"office" : "PRESIDENT",
"county_name" : "Alaska",
"vote_pct" : "0",
"county_fips" : "2000",
"pty" : "CS",
"candidate_name" : "Howard Phillips",
},
{
"office" : "PRESIDENT",
"county_name" : "Alaska",
"vote_pct" : "0",
"county_fips" : "2000",
"pty" : "NL",
"candidate_name" : "John Hagelin",
}
]
该数据继续
答案 0 :(得分:2)
在执行groupby
之前,您可以使用apply
和set_index
获取最大值,然后再设置索引。这允许您在列上而不是在索引上使用groupby
(这很奇怪):
voting = pd.read_json("GE2000.json")
get_largest_vote_pct = lambda row: row[row.vote_pct == row.vote_pct.max()]
largest = voting.groupby('county_fips').apply(get_largest_vote_pct)
largest.set_index(['county_fips','candidate_name','pty','vote_pct'],inplace=True)
print(largest)
vote
county_fips candidate_name pty vote_pct
1001 George W. Bush R 70 11993
2000 George W. Bush R 59 167398
答案 1 :(得分:0)
您可以使用groupby
例如voting.groupby('county_fips')['candidate_name'].max()
。
这里还有更详细的答案: Python : Getting the Row which has the max value in groups using groupby