我正在编写一个应该为pandas生成玩具数据的函数(分组和多索引的例子)。我的目标是生成可能重复几次的组(例如,在实验期间表示条件)。我的尝试:
import itertools as it
import numpy as np
import pandas as pd
p = it.product([[4,5,6],[7,8,9]],[1,2,3])
p = list(p)
p
[([4, 5, 6], 1),
([4, 5, 6], 2),
([4, 5, 6], 3),
([7, 8, 9], 1),
([7, 8, 9], 2),
([7, 8, 9], 3)]
我想只展平内部列表但保留外部列表的结构(并删除元组)。我的解决方案基于this SO post:
def flatten(l):
for el in l:
if isinstance(el, collections.Iterable) and not isinstance(el, (str, bytes)):
yield from flatten(el)
else:
yield el
lf = list(flatten(p))
np.reshape(lf, (len(p), 4))
array([[4, 5, 6, 1],
[4, 5, 6, 2],
[4, 5, 6, 3],
[7, 8, 9, 1],
[7, 8, 9, 2],
[7, 8, 9, 3]])
我有两个问题。首先,是否有更简单的解决方案?其次,当我想在最后创建一个pandas数据帧时,我是否需要做所有这些?数据框应如下所示:
pd.DataFrame(np.reshape(it, (len(p), 4)))
0 1 2 3
0 4 5 6 1
1 4 5 6 2
2 4 5 6 3
3 7 8 9 1
4 7 8 9 2
5 7 8 9 3
答案 0 :(得分:1)
选项1:
In [249]: pd.DataFrame([np.concatenate(t)
for t in it.product([[4,5,6],[7,8,9]],[[1],[2],[3]])])
Out[249]:
0 1 2 3
0 4 5 6 1
1 4 5 6 2
2 4 5 6 3
3 7 8 9 1
4 7 8 9 2
5 7 8 9 3
选项2:纯熊猫解决方案:
In [261]: a = pd.DataFrame([[4,5,6],[7,8,9]], columns=list('abc'))
In [262]: b = pd.DataFrame([[1],[2],[3]], columns=['d'])
In [263]: a
Out[263]:
a b c
0 4 5 6
1 7 8 9
In [264]: b
Out[264]:
d
0 1
1 2
2 3
In [265]: a.assign(k=0).merge(b.assign(k=0), on='k').drop('k',1)
Out[265]:
a b c d
0 4 5 6 1
1 4 5 6 2
2 4 5 6 3
3 7 8 9 1
4 7 8 9 2
5 7 8 9 3