我已根据ID列表创建了以下pandas DataFrame
。
In [8]: df = pd.DataFrame({'groups' : [1,2,3,4],
'id' : ["[1,3]","[2]","[5]","[4,6,7]"]})
Out[9]:
groups id
0 1 [1,3]
1 2 [2]
2 3 [5]
3 4 [4,6,7]
还有另外一个DataFrame
如下。
In [12]: df2 = pd.DataFrame({'id' : [1,2,3,4,5,6,7],
'path' : ["p1,p2,p3,p4","p1,p2,p1","p1,p5,p5,p7","p1,p2,p3,p3","p1,p2","p1","p2,p3,p4"]})
我需要获取每个组的路径值。 E.g
groups path
1 p1,p2,p3,p4
p1,p5,p5,p7
2 p1,p2,p1
3 p1,p2
4 p1,p2,p3,p3
p1
p2,p3,p4
答案 0 :(得分:0)
我不确定这是最好的方法,但它对我有用。顺便提一下,只有在没有""的情况下在df 1中创建id变量时,这才有效。标记,即列表,而不是字符串......
import itertools
df = pd.DataFrame({'groups' : [1,2,3,4],
'id' : [[1,3],[2],[5],[4,6,7]]})
df2 = pd.DataFrame({'id' : [1,2,3,4,5,6,7],
'path' : ["p1,p2,p3,p4","p1,p2,p1","p1,p5,p5,p7","p1,p2,p3,p3","p1,p2","p1","p2,p3,p4"]})
paths = [[] for group in df.groups.unique()]
for x in df.index:
paths[x].extend(itertools.chain(*[list(df2[df2.id == int(y)]['path']) for y in df.id[x]]))
df['paths'] = pd.Series(paths)
df
这可能是一种更简洁的方式,但在某种程度上它是一种奇怪的数据结构。提供以下输出
groups id paths
0 1 [1, 3] [p1,p2,p3,p4, p1,p5,p5,p7]
1 2 [2] [p1,p2,p1]
2 3 [5] [p1,p2]
3 4 [4, 6, 7] [p1,p2,p3,p3, p1, p2,p3,p4]
答案 1 :(得分:0)
您不应构建DataFrame
来嵌入list
个对象。相反,根据id的长度重复组,然后使用pandas.merge
,如下所示:
In [143]: groups = list(range(1, 5))
In [144]: ids = [[1, 3], [2], [5], [4, 6, 7]]
In [145]: df = DataFrame({'groups': np.repeat(groups, list(map(len, ids))), 'id': reduce(lambda
x, y: x + y, ids)})
In [146]: df2 = pd.DataFrame({'id' : [1,2,3,4,5,6,7],
'path' : ["p1,p2,p3,p4","p1,p2,p1","p1,p5,p5,p7","p1,p2,p3,p3","p1,p2","p1","p
2,p3,p4"]})
In [147]: df
Out[147]:
groups id
0 1 1
1 1 3
2 2 2
3 3 5
4 4 4
5 4 6
6 4 7
[7 rows x 2 columns]
In [148]: df2
Out[148]:
id path
0 1 p1,p2,p3,p4
1 2 p1,p2,p1
2 3 p1,p5,p5,p7
3 4 p1,p2,p3,p3
4 5 p1,p2
5 6 p1
6 7 p2,p3,p4
[7 rows x 2 columns]
In [149]: pd.merge(df, df2, on='id', how='outer')
Out[149]:
groups id path
0 1 1 p1,p2,p3,p4
1 1 3 p1,p5,p5,p7
2 2 2 p1,p2,p1
3 3 5 p1,p2
4 4 4 p1,p2,p3,p3
5 4 6 p1
6 4 7 p2,p3,p4
[7 rows x 3 columns]