如何使用pandas行来形成新列

时间:2018-04-23 19:46:22

标签: python python-3.x pandas numpy

我有以下内容:

pa = pd.DataFrame({'a':np.array([[1.,4.],[2.],[3.,4.,5.]]), 
                   'b':np.array([[2.,5.],[3., 6.],[4.,5.,6.]])})

这将产生:

    a               b
0   [1.0, 4.0]      [2.0, 5.0]
1   [2.0, 3.3]      [3.0, 6.0]
2   [3.0, 4.0, 5.0] [4.0, 5.0, 6.0]

我尝试了各种技术将每个数组的项连接成一个新数组。

以这种方式的东西:

    a               b               c
0   [1.0, 4.0]      [2.0, 5.0]      [1.0, 2.0]
1   [1.0, 4.0]      [2.0, 5.0]      [4.0, 5.0]
2   [2.0, 3.3]      [3.0, 6.0]      [2.0, 3.0]
3   [2.0, 3.3]      [3.0, 6.0]      [3.3, 6.0]
4   [3.0, 4.0, 5.0] [4.0, 5.0, 6.0] [3.0, 4.0]
5   [3.0, 4.0, 5.0] [4.0, 5.0, 6.0] [4.0, 5.0]
6   [3.0, 4.0, 5.0] [4.0, 5.0, 6.0] [5.0, 6.0]

如果有其他列,我可以将这些项更新为新创建的列。但我一直坚持到这个位置。

有人可以帮忙吗?

2 个答案:

答案 0 :(得分:2)

使用zip使用不需要的方法

pa['New']=[list(zip(x,y)) for x, y in zip(pa.a,pa.b)]
s=pa.New.str.len()
df=pd.DataFrame({'a':pa['a'].repeat(s),'b':pa['b'].repeat(s),'New':list(map(list,pa.New.sum()))})
df
          New                a                b
0  [1.0, 2.0]       [1.0, 4.0]       [2.0, 5.0]
0  [4.0, 5.0]       [1.0, 4.0]       [2.0, 5.0]
1  [2.0, 3.0]       [2.0, 3.3]       [3.0, 6.0]
1  [3.3, 6.0]       [2.0, 3.3]       [3.0, 6.0]
2  [3.0, 4.0]  [3.0, 4.0, 5.0]  [4.0, 5.0, 6.0]
2  [4.0, 5.0]  [3.0, 4.0, 5.0]  [4.0, 5.0, 6.0]
2  [5.0, 6.0]  [3.0, 4.0, 5.0]  [4.0, 5.0, 6.0]

答案 1 :(得分:0)

IIUC,你需要这样的东西吗?

def f(row):
    return pd.Series(zip(row["a"], row["b"]))

mod = df.apply(f, 1).stack()
mod.index = mod.index.get_level_values(0)

df.merge(mod.to_frame(), left_index=True, right_index=True)

    a               b            c
0   [1.0, 4.0]  [2.0, 5.0]  (1.0, 2.0)
0   [1.0, 4.0]  [2.0, 5.0]  (4.0, 5.0)
1   [2.0, 3.3]  [3.0, 6.0]  (2.0, 3.0)
1   [2.0, 3.3]  [3.0, 6.0]  (3.3, 6.0)
2   [3.0, 4.0, 5.0] [4.0, 5.0, 6.0] (3.0, 4.0)
2   [3.0, 4.0, 5.0] [4.0, 5.0, 6.0] (4.0, 5.0)
2   [3.0, 4.0, 5.0] [4.0, 5.0, 6.0] (5.0, 6.0)