在Pandas列中组合列,其中列名在列表中

时间:2018-01-21 11:01:24

标签: python python-2.7 list pandas

我有三个Pandas列,其中元素为list。为了组合这些列表,我可以通过显式写入列的名称并将它们+放在一起

来完成
df = pd.DataFrame({'allmz':([[1,2,3],[2,4,5],[2,5,5],[2,3,5],[1,4,5]]),'allint':([[11,31,31],[21,41,51],[41,51,51],[11,31,51],[1,51,11]]), 'allx':([[6,7,3],[2,4,5],[2,5,5],[2,9,5],[3,4,5]])})
df['new'] = df['allmz'] + df['allint'] + df['allint']
print df

      allint      allmz       allx                                new
0  [11, 31, 31]  [1, 2, 3]  [6, 7, 3]  [1, 2, 3, 11, 31, 31, 11, 31, 31]
1  [21, 41, 51]  [2, 4, 5]  [2, 4, 5]  [2, 4, 5, 21, 41, 51, 21, 41, 51]
2  [41, 51, 51]  [2, 5, 5]  [2, 5, 5]  [2, 5, 5, 41, 51, 51, 41, 51, 51]
3  [11, 31, 51]  [2, 3, 5]  [2, 9, 5]  [2, 3, 5, 11, 31, 51, 11, 31, 51]
4   [1, 51, 11]  [1, 4, 5]  [3, 4, 5]    [1, 4, 5, 1, 51, 11, 1, 51, 11]

但是,如果我有太多的列名来编写它们,有没有办法通过循环(或不循环)列名列表来实现: 而是columns = ['allmz','allint','allx']

3 个答案:

答案 0 :(得分:3)

选项1
在列上切片并沿第一个轴调用int stringcompare(const char *string, const char *substr) { int i, j, firstOcc; i = 0, j = 0; while(string[i] != '\0') { while(string[i] != substr[0] && string[i] != '\0') { i++; } if(string[i] == '\0') { return -1; } firstOcc = i; while(string[i] == substr[j] && string[i] != '\0' && substr[j] != '\0') { i++; j++; } if(substr[j] == '\0') { return firstOcc; } if(string[i] == '\0') { return -1; } i = firstOcc + 1; j = 0; } }

sum

df['new'] = df[['allmz','allint','allx']].sum(axis=1)

选项2
df allint allmz allx new 0 [11, 31, 31] [1, 2, 3] [6, 7, 3] [1, 2, 3, 11, 31, 31, 6, 7, 3] 1 [21, 41, 51] [2, 4, 5] [2, 4, 5] [2, 4, 5, 21, 41, 51, 2, 4, 5] 2 [41, 51, 51] [2, 5, 5] [2, 5, 5] [2, 5, 5, 41, 51, 51, 2, 5, 5] 3 [11, 31, 51] [2, 3, 5] [2, 9, 5] [2, 3, 5, 11, 31, 51, 2, 9, 5] 4 [1, 51, 11] [1, 4, 5] [3, 4, 5] [1, 4, 5, 1, 51, 11, 3, 4, 5] 的另一个选项:

np.concatenate

v = df[['allmz','allint','allx']].values.tolist()
df['new'] = np.concatenate(v, axis=0).reshape(len(df), -1).tolist()

答案 1 :(得分:2)

您可以使用Python的内置sum功能。

df['new'] = sum([df[col] for col in df], [])

答案 2 :(得分:1)

如果你有一大堆专栏名称,那么解决这个问题的简单方法如下所示:

col = df.loc[: , "allint":"allx"]

其中" allint"是起始列名称" allx"是结束列名称

df['new'] = col.sum(axis=1)
df

这将为您提供与写完每列名称后相同的结果。