我有以下词典:
d = {'col1': ['a', 'b', 'c'],
'col2': [[1,2], [4,3,2], []],
}
我想要一个Pandas DataFrame:
idx, col1, col2
0, 'a', 1
1, 'a', 2
2, 'b', 4
3, 'b', 3
4, 'b', 2
5 'c', nan
如何构建?如果我只是传递dict,它不会解开/重复col2中的列表项。 谢谢!
答案 0 :(得分:3)
你只需要自己构建它。这是一种方式:
col1 = ['a', 'b', 'c']
col2 = [[1,2], [4,3,2], []]
col2_lens = map(len, col2)
# flatten col2
s2 = Series([eli for el in col2 for eli in (el or [nan])])
# replicate elements of col1 col2[i] times
s1 = Series(list(''.join(el * (col2_len or 1) for el, col2_len in zip(col1, col2_lens))))
concat([s1, s2], axis=1)
产生
0 1
0 a 1
1 a 2
2 b 4
3 b 3
4 b 2
5 c NaN
以下是此处显示的3种方法的%%timeit
1
%%timeit
col2_lens = map(len, col2)
# flatten col2
s2 = Series([eli for el in col2 for eli in (el or [nan])])
# replicate elements of col1 col2[i] times
s1 = Series(list(''.join(el * (col2_len or 1) for el, col2_len in zip(col1, col2_lens))))
concat([s1, s2], axis=1)
1000 loops, best of 3: 646 µs per loop
2
%%timeit
df = DataFrame()
for a, b in zip(col1, col2):
df = pd.concat([df, pd.DataFrame({'col1': a, 'col2': b or [np.nan]})])
100 loops, best of 3: 2.52 ms per loop
3
%%timeit
frames = []
for a, b in zip(col1, col2):
frames.append(pd.DataFrame({'col1': a, 'col2': b or [np.nan]}))
df = pd.concat(frames)
1000 loops, best of 3: 1.58 ms per loop