我需要在一个包含列表的列中打包pandas DataFrame列。示例:
有关
>>>df
a b c
0 81 88 1
1 42 7 23
2 8 37 63
3 18 22 20
制作列表栏:
list_col
0 [81,88,1]
1 [42,7,23]
2 [8,37,63]
3 [18,22,20]
如果我尝试
df.apply(列表中,轴= 1)
python返回相同的DataFrame。
如果我尝试
>>> df.apply(lambda r:{'list_col':list(r)},axis=1)
a b c
0 NaN NaN NaN
1 NaN NaN NaN
2 NaN NaN NaN
3 NaN NaN NaN
无效。
甚至粗暴的方法
>>> df['list_col'] = ''
>>> for i in df.index:
df.ix[i,'list_col'] = list(df.ix[i,df.columns[:-1]])
返回错误:
Traceback (most recent call last):
File "<pyshell#45>", line 2, in <module>
df.ix[i,'list_col'] = list(df.ix[i,df.columns[:-1]])
File "C:\Anaconda\lib\site-packages\pandas\core\indexing.py", line 88, in __setitem__
self._setitem_with_indexer(indexer, value)
File "C:\Anaconda\lib\site-packages\pandas\core\indexing.py", line 158, in _setitem_with_indexer
len(self.obj[labels[0]]) == len(value) or len(plane_indexer[0]) == len(value)):
TypeError: object of type 'int' has no len()
我找到的唯一工作方法是:
df['list_col'] = df.apply(lambda r:{df.columns[0]:list(r)}, axis=1)[df.columns[0]]
这给了我想要的东西,但也许有更直接的方式?
答案 0 :(得分:3)
只需将列分配为df.values
上的列表即可:
df['list_col'] = list(df.values)
df
a b c list_col
0 81 88 1 [81, 88, 1]
1 42 7 23 [42, 7, 23]
2 8 37 63 [8, 37, 63]
3 18 22 20 [18, 22, 20]
答案 1 :(得分:0)
这是一种矢量化方法,与@Anzel's solution非常相似:
In [55]: df
Out[55]:
a b c
0 81 88 1
1 42 7 23
2 8 37 63
3 18 22 20
In [56]: df['list_col'] = df.values.tolist()
In [57]: df
Out[57]:
a b c list_col
0 81 88 1 [81, 88, 1]
1 42 7 23 [42, 7, 23]
2 8 37 63 [8, 37, 63]
3 18 22 20 [18, 22, 20]
针对4M行的时间DF:
In [69]: df.shape
Out[69]: (4000000, 3)
In [70]: %timeit list(df.values)
1 loop, best of 3: 2.04 s per loop
In [71]: %timeit df.values.tolist()
1 loop, best of 3: 993 ms per loop