Pandas:如果列中的值是字典

时间:2016-10-12 08:38:12

标签: python pandas dictionary

我有数据框

category  dictionary
moto   {'motocycle':10, 'buy":8, 'motocompetition':7}
shopping   {'buy':200, 'order':20, 'sale':30}
IT   {'iphone':214, 'phone':1053, 'computer':809}
shopping  {'zara':23, 'sale':18, 'sell':20}
IT   {'lenovo':200, 'iphone':300, 'mac':200}

我需要groupby类别,结果连接字典并选择具有最大值的3个键。然后获取数据框,在category列中我有唯一的类别,在列data中我有列表和键。

我知道,我可以使用Counter来连接词汇,但我不知道,这对于类别是怎么做的。 欲望输出

category   data
moto   ['motocycle', 'buy', 'motocompetition']
shopping   ['buy', 'sale', 'zara']
IT   ['phone', 'computer', 'iphone']

2 个答案:

答案 0 :(得分:3)

您可以将groupby与自定义功能nlargestIndex.tolist一起使用:

df = pd.DataFrame({
'category':['moto','shopping','IT','shopping','IT'],
'dictionary':
[{'motocycle':10, 'buy':8, 'motocompetition':7},
{'buy':200, 'order':20, 'sale':30},
{'iphone':214, 'phone':1053, 'computer':809},
{'zara':23, 'sale':18, 'sell':20},
{'lenovo':200, 'iphone':300, 'mac':200}]})

print (df)
   category                                         dictionary
0      moto  {'motocycle': 10, 'buy': 8, 'motocompetition': 7}
1  shopping              {'sale': 30, 'buy': 200, 'order': 20}
2        IT    {'phone': 1053, 'computer': 809, 'iphone': 214}
3  shopping               {'sell': 20, 'zara': 23, 'sale': 18}
4        IT         {'lenovo': 200, 'mac': 200, 'iphone': 300}


import collections
import functools
import operator

def f(x):
    #some possible solution for sum values of dict
    #http://stackoverflow.com/a/3491086/2901002
    return pd.Series(functools.reduce(operator.add, map(collections.Counter, x)))
             .nlargest(3).index.tolist()

print (df.groupby('category')['dictionary'].apply(f).reset_index())
   category                         dictionary
0        IT          [phone, computer, iphone]
1      moto  [motocycle, buy, motocompetition]
2  shopping                  [buy, sale, zara]

答案 1 :(得分:1)

df = pd.DataFrame(dict(category=['moto', 'shopping', 'IT', 'shopping', 'IT'],
                       dictionary=[
                           dict(motorcycle=10, buy=8, motocompetition=7),
                           dict(buy=200, order=20, sale=30),
                           dict(iphone=214, phone=1053, computer=809),
                           dict(zara=23, sale=18, sell=20),
                           dict(lenovo=200, iphone=300, mac=200),
                       ]))

def top3(x):
    return x.dropna().sort_values().tail(3)[::-1].index.tolist()

df.dictionary.apply(pd.Series).groupby(df.category).sum().apply(top3, axis=1)

category
IT                   [phone, computer, iphone]
moto        [motorcycle, buy, motocompetition]
shopping                     [buy, sale, zara]
dtype: object