我试图在熊猫的一列中找到10个最常见的项目,基本上类似于使用value_counts()。问题在于该列包含字典条目,如下所示:
import pandas as pd
import json
from pandas.io.json import json_normalize
df = pd.read_json('data/world_bank_projects.json')
print(df['mjtheme_namecode'].head())
0 [{'code': '8', 'name': 'Human development'}, {...
1 [{'code': '1', 'name': 'Economic management'},...
2 [{'code': '5', 'name': 'Trade and integration'...
3 [{'code': '7', 'name': 'Social dev/gender/incl...
4 [{'code': '5', 'name': 'Trade and integration'...
Name: mjtheme_namecode, dtype: object
如何按计数(代码号或名称)进行排序?
答案 0 :(得分:2)
假设您具有以下DataFrame:
df = pd.DataFrame({'col1': [[{'code': random.randint(0, 10), 'name': ''.join(random.sample('abcdef', 3))} for _ in range(2)] for _ in range(3)]})
col1
0 [{'code': 1, 'name': 'bfc'}, {'code': 7, 'name...
1 [{'code': 7, 'name': 'cda'}, {'code': 0, 'name...
2 [{'code': 2, 'name': 'fea'}, {'code': 7, 'name...
将此扩展到另一个DataFrame:
tmp = pd.DataFrame([val for pair in df.col1 for val in pair])
code name
0 1 bfc
1 7 dfa
2 7 cda
3 0 cfb
4 2 fea
5 7 cdb
现在您可以轻松查询此新的DataFrame:
tmp.code.value_counts()
7 3
2 1
1 1
0 1
Name: code, dtype: int64
我能够找到您的输入数据,因此这是将其应用于该数据集的方法:
outdf = pd.DataFrame([val for pair in df['mjtheme_namecode'] for val in pair])
outdf.name.value_counts().nlargest(5)
# Result
Environment and natural resources management 223
Rural development 202
Human development 197
Public sector governance 184
Social protection and risk management 158
Name: name, dtype: int64