Question

我需要在一个包含列表的列中打包pandas DataFrame列。示例：

有关

>>>df
    a   b   c
0  81  88   1
1  42   7  23
2   8  37  63
3  18  22  20

制作列表栏：

    list_col
0  [81,88,1]
1  [42,7,23]
2  [8,37,63]
3  [18,22,20]

如果我尝试

df.apply（列表中，轴= 1）

python返回相同的DataFrame。

如果我尝试

>>> df.apply(lambda r:{'list_col':list(r)},axis=1)
    a   b   c
0 NaN NaN NaN
1 NaN NaN NaN
2 NaN NaN NaN
3 NaN NaN NaN

无效。

甚至粗暴的方法

>>> df['list_col'] = ''
>>> for i in df.index:
    df.ix[i,'list_col'] = list(df.ix[i,df.columns[:-1]])

返回错误：

Traceback (most recent call last):
  File "<pyshell#45>", line 2, in <module>
    df.ix[i,'list_col'] = list(df.ix[i,df.columns[:-1]])
  File "C:\Anaconda\lib\site-packages\pandas\core\indexing.py", line 88, in __setitem__
    self._setitem_with_indexer(indexer, value)
  File "C:\Anaconda\lib\site-packages\pandas\core\indexing.py", line 158, in _setitem_with_indexer
    len(self.obj[labels[0]]) == len(value) or len(plane_indexer[0]) == len(value)):
TypeError: object of type 'int' has no len()

我找到的唯一工作方法是：

df['list_col'] = df.apply(lambda r:{df.columns[0]:list(r)}, axis=1)[df.columns[0]]

这给了我想要的东西，但也许有更直接的方式？

Answer 1

只需将列分配为df.values上的列表即可：

df['list_col'] = list(df.values)

df
    a   b   c      list_col
0  81  88   1   [81, 88, 1]
1  42   7  23   [42, 7, 23]
2   8  37  63   [8, 37, 63]
3  18  22  20  [18, 22, 20]

Answer 2

这是一种矢量化方法，与@Anzel's solution非常相似：

In [55]: df
Out[55]:
    a   b   c
0  81  88   1
1  42   7  23
2   8  37  63
3  18  22  20

In [56]: df['list_col'] = df.values.tolist()

In [57]: df
Out[57]:
    a   b   c      list_col
0  81  88   1   [81, 88, 1]
1  42   7  23   [42, 7, 23]
2   8  37  63   [8, 37, 63]
3  18  22  20  [18, 22, 20]

针对4M行的时间DF：

In [69]: df.shape
Out[69]: (4000000, 3)

In [70]: %timeit list(df.values)
1 loop, best of 3: 2.04 s per loop

In [71]: %timeit df.values.tolist()
1 loop, best of 3: 993 ms per loop

将数据帧列打包到pandas中列出

2 个答案: