Question

我正在尝试将数据帧转换为以下格式的字典：

name    age     country     state   pincode
user1   10      in          tn      1
user2   11      in          tx      2
user3   12      eu          gh      3
user4   13      eu          io      4
user5   14      us          pi      5
user6   15      us          ew      6

输出根据国家/地区对用户进行分组，并拥有一个用户字典，其中包含用户的详细信息

{
  'in': {
          'user1': {'age': 10, 'state': 'tn', 'pincode': 1},
          'user2': {'age': 11, 'state': 'tx', 'pincode': 2}
        },
 'eu':  {
          'user3': {'age': 12, 'state': 'gh', 'pincode': 3},
          'user4': {'age': 13, 'state': 'io', 'pincode': 4},
        },
 'us': { 
          'user5': {'age': 14, 'state': 'pi', 'pincode': 5},
          'user6': {'age': 15, 'state': 'ew', 'pincode': 6},
       }
}

我目前正在通过以下语句执行此操作（这并不完全正确，因为我在循环内使用列表，而应该是字典）：

op2 = {}
for i, row in sample2.iterrows():
    if row['country'] not in op2:
            op2[row['country']] = []
    op2[row['country']] = {row['name'] : {'age':row['age'],'state':row['state'],'pincode':row['pincode']}}

如果要在df中添加其他列，我希望该解决方案能够正常工作。例如电话号码。由于我编写的语句是静态的，因此不会在输出中提供其他行。大熊猫中有内置的方法吗？

Answer 1

您可以将to_dict与groupby结合使用：

{k:v.drop('country',axis=1).to_dict('i') 
    for k,v in df.set_index('name').groupby('country')}

输出：

{'eu': {'user3': {'age': 12, 'state': 'gh', 'pincode': 3},
  'user4': {'age': 13, 'state': 'io', 'pincode': 4}},
 'in': {'user1': {'age': 10, 'state': 'tn', 'pincode': 1},
  'user2': {'age': 11, 'state': 'tx', 'pincode': 2}},
 'us': {'user5': {'age': 14, 'state': 'pi', 'pincode': 5},
  'user6': {'age': 15, 'state': 'ew', 'pincode': 6}}}

在熊猫数据框上创建多个索引

1 个答案: