我在下面有这个数据,这是一个包含4个元素的列表。这些元素是元组,哪些项目列为自己......
data = [(['a', 'b', 'c'],
[1, 2, 3, 4, 5],
['aa', 'bb'],
['00', '03', '0000', '0006']),
(['e', 'f', 'g'],
[2, 1, 4, 4, 6],
['qq', 'er'],
['10', '04', '3340', '9009']),
(['w', 'd', 'c'],
[5, 6, 55, 1, 6],
['rr', 'rr'],
['55', '11', '6788', '7789']),
(['l', 'a', 's'],
[29, 2, 9, 4, 3],
['yy', 'uu'],
['33', '67', '0000', '0237'])]
我想将它转换为数据帧,使每个元素都被分解到数据帧的列上。例如; df = pd.DataFrame(data)
将导致具有四列的数据帧。我想要的是将每列拆分成数据帧的列,如下面的红线所示......
也就是说,上面的数据框将每个列子分为组成单元格的项目数。
答案 0 :(得分:0)
您可以展平嵌套的list
:
df = pd.DataFrame([[item for sublist in l for item in sublist] for l in data])
print (df)
0 1 2 3 4 5 6 7 8 9 10 11 12 13
0 a b c 1 2 3 4 5 aa bb 00 03 0000 0006
1 e f g 2 1 4 4 6 qq er 10 04 3340 9009
2 w d c 5 6 55 1 6 rr rr 55 11 6788 7789
3 l a s 29 2 9 4 3 yy uu 33 67 0000 0237
<强>计时强>:
data = data * 100
In [128]: %timeit pd.DataFrame([[item for sublist in l for item in sublist] for l in data])
100 loops, best of 3: 2.03 ms per loop
#cᴏʟᴅsᴘᴇᴇᴅ1
In [137]: %timeit pd.DataFrame(list(map(lambda d: list(chain.from_iterable(d)), data)))
1000 loops, best of 3: 1.97 ms per loop
#cᴏʟᴅsᴘᴇᴇᴅ2
In [129]: %timeit pd.DataFrame(np.concatenate(list(zip(*data)), axis=1))
1000 loops, best of 3: 1.46 ms per loop
#cᴏʟᴅsᴘᴇᴇᴅ3
In [130]: %timeit pd.DataFrame([np.concatenate(d) for d in data])
100 loops, best of 3: 5.9 ms per loop
data = data * 10000
In [121]: %timeit pd.DataFrame([[item for sublist in l for item in sublist] for l in data])
10 loops, best of 3: 99.2 ms per loop
#cᴏʟᴅsᴘᴇᴇᴅ1
In [139]: %timeit pd.DataFrame(list(map(lambda d: list(chain.from_iterable(d)), data)))
10 loops, best of 3: 95.8 ms per loop
#cᴏʟᴅsᴘᴇᴇᴅ2
In [122]: %timeit pd.DataFrame(np.concatenate(list(zip(*data)), axis=1))
10 loops, best of 3: 150 ms per loop
#cᴏʟᴅsᴘᴇᴇᴅ3
In [123]: %timeit pd.DataFrame([np.concatenate(d) for d in data])
1 loop, best of 3: 560 ms per loop