我有一个数据框,其中包含语言作为列名,以及包含帐户名的1x最终列:
EN DE IT Account
Milan Mailand Milano Italy
Florence Florenz Firenze Italy
London London Londra UK
Belgrade Belgrad Belgrado World
我需要从此数据库中提取信息,根据列名(语言)和帐户列中的值的组合创建所有可能的列表。
例如,输出将是:
EN_Italy = ['Milan', 'Florence']
DE_Italy = ['Mailand', 'Florenz']
IT_Italy = ['Milano', 'Firenze']
EN_UK = ['London']
DE_UK = ['London']
IT_UK = ['Londra']
EN_World = ['Belgrade']
DE_World = ['Belgrad']
IT_World = ['Belgrado']
是否可以这样做? 谢谢!
答案 0 :(得分:3)
您可以 aggregate()
:
df = df.groupby("Account").aggregate(lambda k: list(k)).reset_index()
Account DE EN IT
0 Italy [Mailand, Florenz] [Milan, Florence] [Milano, Firenze]
1 UK [London] [London] [Londra]
2 World [Belgrad] [b] [Belgrado]
要获取列表,请进行简单的选择,例如
df[df.Account == "Italy"]["DE"]
0 [Mailand, Florenz]
答案 1 :(得分:3)
对于可变数量的变量,字典通常是一个不错的选择。
您可以使用collections.defaultdict
:
from collections import defaultdict
d = defaultdict(list)
for row in df.itertuples():
for i in row._fields[1:-1]:
d[i+'_'+row.Account].append(getattr(row, i))
<强>结果强>
defaultdict(list,
{'DE_Italy': ['Mailand', 'Florenz'],
'DE_UK': ['London'],
'DE_World': ['Belgrad'],
'EN_Italy': ['Milan', 'Florence'],
'EN_UK': ['London'],
'EN_World': ['Belgrade'],
'IT_Italy': ['Milano', 'Firenze'],
'IT_UK': ['Londra'],
'IT_World': ['Belgrado']})
<强>解释强>
defaultdict
个列表。答案 2 :(得分:3)
使用堆栈
df.set_index('Account').unstack().groupby(level=[0, 1]).apply(list)
Account
EN Italy [Milan, Florence]
UK [London]
World [Belgrade]
DE Italy [Mailand, Florenz]
UK [London]
World [Belgrad]
IT Italy [Milano, Firenze]
UK [Londra]
World [Belgrado]
dtype: object
d = df.set_index('Account').ustack().groupby(level=[0, 1]).apply(list)
d.index = d.index.map('_'.join)
d
EN_Italy [Milan, Florence]
EN_UK [London]
EN_World [Belgrade]
DE_Italy [Mailand, Florenz]
DE_UK [London]
DE_World [Belgrad]
IT_Italy [Milano, Firenze]
IT_UK [Londra]
IT_World [Belgrado]
dtype: object
或者
d.to_dict()
{'DE_Italy': ['Mailand', 'Florenz'],
'DE_UK': ['London'],
'DE_World': ['Belgrad'],
'EN_Italy': ['Milan', 'Florence'],
'EN_UK': ['London'],
'EN_World': ['Belgrade'],
'IT_Italy': ['Milano', 'Firenze'],
'IT_UK': ['Londra'],
'IT_World': ['Belgrado']}
答案 3 :(得分:1)
另一种dict理解方法:
accts = df['Account']
langs = [col for col in df.columns if col != 'Account']
result = {'{}_{}'.format(lang, acct): df.loc[df['Account']==acct, lang].tolist() \
for lang in langs for acct in accts}