基于另一个切片一个Pandas DataFrame

时间:2014-04-07 08:43:15

标签: python pandas dataframe

我已根据ID列表创建了以下pandas DataFrame

In [8]: df = pd.DataFrame({'groups' : [1,2,3,4],
                'id'  : ["[1,3]","[2]","[5]","[4,6,7]"]})
Out[9]: 
   groups     id
0       1    [1,3]
1       2      [2]
2       3      [5]
3       4  [4,6,7]

还有另外一个DataFrame如下。

In [12]: df2 = pd.DataFrame({'id' : [1,2,3,4,5,6,7],
                'path'  : ["p1,p2,p3,p4","p1,p2,p1","p1,p5,p5,p7","p1,p2,p3,p3","p1,p2","p1","p2,p3,p4"]})

我需要获取每个组的路径值。 E.g

groups path
1      p1,p2,p3,p4
       p1,p5,p5,p7
2      p1,p2,p1
3      p1,p2
4      p1,p2,p3,p3
       p1
       p2,p3,p4

2 个答案:

答案 0 :(得分:0)

我不确定这是最好的方法,但它对我有用。顺便提一下,只有在没有""的情况下在df 1中创建i​​d变量时,这才有效。标记,即列表,而不是字符串......

import itertools

df = pd.DataFrame({'groups' : [1,2,3,4],
                  'id'  : [[1,3],[2],[5],[4,6,7]]})
df2 = pd.DataFrame({'id' : [1,2,3,4,5,6,7],
                    'path'  : ["p1,p2,p3,p4","p1,p2,p1","p1,p5,p5,p7","p1,p2,p3,p3","p1,p2","p1","p2,p3,p4"]})

paths = [[] for group in df.groups.unique()]
for x in df.index:
    paths[x].extend(itertools.chain(*[list(df2[df2.id == int(y)]['path']) for y in df.id[x]]))                                      
df['paths'] = pd.Series(paths)
df

这可能是一种更简洁的方式,但在某种程度上它是一种奇怪的数据结构。提供以下输出

    groups    id           paths
0    1      [1, 3]        [p1,p2,p3,p4, p1,p5,p5,p7]
1    2      [2]           [p1,p2,p1]
2    3      [5]           [p1,p2]
3    4      [4, 6, 7]     [p1,p2,p3,p3, p1, p2,p3,p4]

答案 1 :(得分:0)

您不应构建DataFrame来嵌入list个对象。相反,根据id的长度重复组,然后使用pandas.merge,如下所示:

In [143]: groups = list(range(1, 5))

In [144]: ids = [[1, 3], [2], [5], [4, 6, 7]]

In [145]: df = DataFrame({'groups': np.repeat(groups, list(map(len, ids))), 'id': reduce(lambda
 x, y: x + y, ids)})

In [146]: df2 = pd.DataFrame({'id' : [1,2,3,4,5,6,7],
                'path'  : ["p1,p2,p3,p4","p1,p2,p1","p1,p5,p5,p7","p1,p2,p3,p3","p1,p2","p1","p
2,p3,p4"]})

In [147]: df
Out[147]:
   groups  id
0       1   1
1       1   3
2       2   2
3       3   5
4       4   4
5       4   6
6       4   7

[7 rows x 2 columns]

In [148]: df2
Out[148]:
   id         path
0   1  p1,p2,p3,p4
1   2     p1,p2,p1
2   3  p1,p5,p5,p7
3   4  p1,p2,p3,p3
4   5        p1,p2
5   6           p1
6   7     p2,p3,p4

[7 rows x 2 columns]

In [149]: pd.merge(df, df2, on='id', how='outer')
Out[149]:
   groups  id         path
0       1   1  p1,p2,p3,p4
1       1   3  p1,p5,p5,p7
2       2   2     p1,p2,p1
3       3   5        p1,p2
4       4   4  p1,p2,p3,p3
5       4   6           p1
6       4   7     p2,p3,p4

[7 rows x 3 columns]