Pandas数据框合并列上的行以形成字典列表

时间:2017-10-03 07:34:35

标签: python python-2.7 pandas dataframe

我的数据框看起来像

DATA

*id*,             *name*,                      *URL*,                 *Type*  
    2,             birth_france_by_region,    http://abc. com,       T1 
    2,             birth_france_by_region,    http://pt. python,     T2 
    3,             long_lat,                  http://abc. com,       T3 
    3,             long_lat,                  http://pqur. com,      T1 
    4,             random_time_series,        http://sadsdc. com,    T2 
    4,             random_time_series,        http://sadcadf. com,   T3
    5,             birth_names,               http://google. com,    T1 
    5,             birth_names,               http://helloworld. com,T2 
    5,             birth_names,               http://hu. com,        T3

我希望此数据框合并id相等的行,并将字典类型的列表作为字典 URL 的键作为值 所以最终输出: -

*id*, *name*,             *URL*  
2,birth_france_by_region,  [{T1:http://abc .com},{T2:http://pt.python}] 
3,long_lat,           [{T3:http://abc .com},{T1:http://pqur. com}] 
4,random_time_series, [{T2:http://sadsdc. com},{T3:http://sadcadf .com}] 
5,birth_names,        [{T1:http://google .com},{T2:http://helloworld. com},
                                       {T3:http://hu. com}] 

2 个答案:

答案 0 :(得分:2)

groupby与自定义功能一起使用:

df = (df.groupby([df['id'],df['name']])
       .apply(lambda x: [{k:v} for k, v in zip(x['Type'], x['URL'])])
       .reset_index(name='URL'))
print (df)
   id                    name  \
0   2  birth_france_by_region   
1   3                long_lat   
2   4      random_time_series   
3   5             birth_names   

                                                 URL  
0  [{'T1': 'http://abc. com'}, {'T2': 'http://pt....  
1  [{'T3': 'http://abc. com'}, {'T1': 'http://pqu...  
2  [{'T2': 'http://sadsdc. com'}, {'T3': 'http://...  
3  [{'T1': 'http://google. com'}, {'T2': 'http://...  

答案 1 :(得分:0)

这是获得所需结果的方法:

df["temp"] = [{x: y} for x, y in list(zip(df["*Type*"], df["*URL*"]))]
df.groupby("*name*")["temp"].apply(lambda x: list(x))

对于玩具示例:

df = pd.DataFrame({'b': ["100","1","2","4","6","-55"], 
                   'a': ['a','b','c','d','e','f'],
                   'c': ["A","A","B","B","C","C"]})

df["temp"] = [{x: y} for x, y in list(zip(df["a"], df["b"]))]
df.groupby("c")["temp"].apply(lambda x: list(x))

输出结果为:

c
A    [{'a': '100'}, {'b': '1'}]
B      [{'c': '2'}, {'d': '4'}]
C    [{'e': '6'}, {'f': '-55'}]