我有下面的数据框,想做循环:
df = name
a
b
c
d
我尝试了以下代码:
for index, row in df.iterrows():
for line in df['name']:
print(index, line)
但是我想要的结果是一个数据帧,如下所示:
df = name name1
a a
a b
a c
a d
b a
b b
b c
b d
etc.
有什么可行的方法吗?我知道这是一个愚蠢的问题,但我是python新手
答案 0 :(得分:4)
使用pandas.DataFrame.explode
的一种方式:
df["name1"] = [df["name"] for _ in df["name"]]
df.explode("name1")
输出:
name name1
0 a a
0 a b
0 a c
0 a d
1 b a
1 b b
1 b c
1 b d
2 c a
2 c b
2 c c
2 c d
3 d a
3 d b
3 d c
3 d d
答案 1 :(得分:2)
以numpy最快的解决方案,谢谢@ Ch3steR:
df = pd.DataFrame({'name':np.repeat(df['name'],len(df)),
'name1':np.tile(df['name'],len(df))}
将itertools.product
与DataFrame
构造函数一起使用:
from itertools import product
df = pd.DataFrame(product(df['name'], df['name']), columns=['name','name1'])
#oldier pandas versions
#df = pd.DataFrame(list(product(df['name'], df['name'])), columns=['name','name1'])
print (df)
name name1
0 a a
1 a b
2 a c
3 a d
4 b a
5 b b
6 b c
7 b d
8 c a
9 c b
10 c c
11 c d
12 d a
13 d b
14 d c
15 d d
另一个想法是使用交叉连接,如果性能很重要,则是最佳解决方案:
df1 = df.assign(new=1)
df = df1.merge(df1, on='new', suffixes=('','1')).drop('new', axis=1)
性能:
from itertools import product
df = pd.DataFrame({'name':range(1000)})
# print (df)
In [17]: %%timeit
...: df["name1"] = [df["name"] for _ in df["name"]]
...: df.explode("name1")
...:
...:
18.9 s ± 1.18 s per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [18]: %%timeit
...: pd.DataFrame(product(df['name'], df['name']), columns=['name','name1'])
...:
1.01 s ± 62.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [19]: %%timeit
...: df1 = df.assign(new=1)
...: df1.merge(df1, on='new', suffixes=('','1')).drop('new', axis=1)
...:
...:
245 ms ± 21.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [20]: %%timeit
...: pd.DataFrame({'name':np.repeat(df['name'],len(df)), 'name1':np.tile(df['name'],len(df))})
...:
30.2 ms ± 1.43 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)