我的数据框看起来像
*id*, *name*, *URL*, *Type*
2, birth_france_by_region, http://abc. com, T1
2, birth_france_by_region, http://pt. python, T2
3, long_lat, http://abc. com, T3
3, long_lat, http://pqur. com, T1
4, random_time_series, http://sadsdc. com, T2
4, random_time_series, http://sadcadf. com, T3
5, birth_names, http://google. com, T1
5, birth_names, http://helloworld. com,T2
5, birth_names, http://hu. com, T3
我希望此数据框合并id相等的行,并将字典类型的列表作为字典 URL 的键作为值 所以最终输出: -
*id*, *name*, *URL*
2,birth_france_by_region, [{T1:http://abc .com},{T2:http://pt.python}]
3,long_lat, [{T3:http://abc .com},{T1:http://pqur. com}]
4,random_time_series, [{T2:http://sadsdc. com},{T3:http://sadcadf .com}]
5,birth_names, [{T1:http://google .com},{T2:http://helloworld. com},
{T3:http://hu. com}]
答案 0 :(得分:2)
将groupby
与自定义功能一起使用:
df = (df.groupby([df['id'],df['name']])
.apply(lambda x: [{k:v} for k, v in zip(x['Type'], x['URL'])])
.reset_index(name='URL'))
print (df)
id name \
0 2 birth_france_by_region
1 3 long_lat
2 4 random_time_series
3 5 birth_names
URL
0 [{'T1': 'http://abc. com'}, {'T2': 'http://pt....
1 [{'T3': 'http://abc. com'}, {'T1': 'http://pqu...
2 [{'T2': 'http://sadsdc. com'}, {'T3': 'http://...
3 [{'T1': 'http://google. com'}, {'T2': 'http://...
答案 1 :(得分:0)
这是获得所需结果的方法:
df["temp"] = [{x: y} for x, y in list(zip(df["*Type*"], df["*URL*"]))]
df.groupby("*name*")["temp"].apply(lambda x: list(x))
对于玩具示例:
df = pd.DataFrame({'b': ["100","1","2","4","6","-55"],
'a': ['a','b','c','d','e','f'],
'c': ["A","A","B","B","C","C"]})
df["temp"] = [{x: y} for x, y in list(zip(df["a"], df["b"]))]
df.groupby("c")["temp"].apply(lambda x: list(x))
输出结果为:
c
A [{'a': '100'}, {'b': '1'}]
B [{'c': '2'}, {'d': '4'}]
C [{'e': '6'}, {'f': '-55'}]