希望您能帮助我解决这个问题,
我在下面有此数据(列名称不限)
data=([['file0090',
([[ 84, 55, 189],
[248, 100, 18],
[ 68, 115, 88]])],
['file6565',
([[ 86, 58, 189],
[24, 10, 118],
[ 68, 11, 8]])
]])
我需要遍历第0列和第1列进入可以转换为Dataframe的排序列表 输出如下:
col0 col1 col2 col3
file0090 84 55 189
file0090 248 100 1
file0090 68 115 88
file6565 86 58 189
file6565 24 10 118
file6565 68 11 8
我已经用迭代,迭代项,项目, 并追加到列表中,但结果始终围绕相同的输出,但我不知道项目与这些数组的分离程度如何
如果可以帮助的话,请先谢谢您。
答案 0 :(得分:6)
您可以尝试以下操作:-
data_f = [[i[0]]+j for i in data for j in i[1]]
df = pd.DataFrame(data_f, columns =['col0','col1','col2','col3'])
输出:-
col0 col1 col2 col3
file0090 84 55 189
file0090 248 100 1
file0090 68 115 88
file6565 86 58 189
file6565 24 10 118
file6565 68 11 8
答案 1 :(得分:5)
您可以在创建一系列列表中的另一个df之后,使用explode
来join
:
df = pd.DataFrame(data).add_prefix('col')
out = df.explode('col1').reset_index(drop=True)
out = out.join(pd.DataFrame(out.pop('col1').tolist()).add_prefix('col_'))
如果列表结构相似,则添加另一个解决方案:
l = [*itertools.chain.from_iterable(data)]
pd.DataFrame(np.vstack(l[1::2]),index = np.repeat(l[::2],len(l[1])))
col0 col_0 col_1 col_2
0 file0090 84 55 189
1 file0090 248 100 18
2 file0090 68 115 88
3 file6565 86 58 189
4 file6565 24 10 118
5 file6565 68 11 8
答案 2 :(得分:4)
我们可以对行执行explode
,它会再次与列一起爆炸
s = pd.DataFrame(data).set_index(0)[1].explode()
df = pd.DataFrame(s.tolist(), index = s.index.values)
df
Out[396]:
0 1 2
file0090 84 55 189
file0090 248 100 18
file0090 68 115 88
file6565 86 58 189
file6565 24 10 118
file6565 68 11 8
答案 3 :(得分:4)
您可以创建一个自定义函数来输出正确格式的数据。
from itertools import chain
def transform(d):
for l in d:
*x, y = l
yield list(map(lambda s: x+s, y))
df = pd.DataFrame(chain(*transform(data)))
df
0 1 2 3
0 file0090 84 55 189
1 file0090 248 100 18
2 file0090 68 115 88
3 file6565 86 58 189
4 file6565 24 10 118
5 file6565 68 11 8
所有解决方案的时间结果:
# YOBEN_S's answer
In [275]: %%timeit
...: s = pd.DataFrame(data).set_index(0)[1].explode()
...: df = pd.DataFrame(s.tolist(), index = s.index.values)
...:
...:
1.52 ms ± 59.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
#Anky's answer
In [276]: %%timeit
...: df = pd.DataFrame(data).add_prefix('col')
...: out = df.explode('col1').reset_index(drop=True)
...: out = out.join(pd.DataFrame(out.pop('col1').tolist()).add_prefix('col_'))
...:
...:
3.71 ms ± 606 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
#Dhaval's answer
In [277]: %%timeit
...: data_f = []
...: for i in data:
...: for j in i[1]:
...: data_f.append([i[0]]+j)
...: df = pd.DataFrame(data_f, columns =['col0','col1','col2','col3'])
...:
...:
712 µs ± 24.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
#My answer
In [280]: %%timeit
...: pd.DataFrame(chain(*transform(data)))
...:
...:
489 µs ± 8.91 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
#Using List comp of Dhaval's answer
In [306]: %%timeit
...: data_f = [[i[0]]+j for i in data for j in i[1]]
...: df = pd.DataFrame(data_f, columns =['col0','col1','col2','col3'])
...:
...:
586 µs ± 25 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
#Anky's 2nd solution
In [308]: %%timeit
...: l = [*chain.from_iterable(data)]
...: pd.DataFrame(np.vstack(l[1::2]),index = np.repeat(l[::2],len(l[1])))
...:
...:
221 µs ± 18.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)