Question

我正在寻找一种以更快，更优雅的方式（无循环）解决以下问题的方法。给定两个数据帧df1和df2，我如何创建一个新的数据帧，其中包含代表df1和df2的行的所有可能组合的行。也就是说，如果df1具有n_1个观测值，而df2具有n_2个观测值，则新数据帧将具有n_1 * n_2个观测值。请注意，索引在这里无关紧要。而且，数据帧的顺序（即将df1扩展为df2还是将df2扩展为df1并不重要）。提前非常感谢！

给出2个熊猫数据框：

# df1
        a   b
    0   0   -1.086
    1   1   0.997
    2   2   0.283


# df2
        c   d
    0   100 0.719
    1   101 0.423
    2   102 0.981
    3   103 0.685

所需的数据帧：

# df_new
    a       b       c       d
0   0.000   -1.086  100.000 0.719
1   0.000   -1.086  101.000 0.423
2   0.000   -1.086  102.000 0.981
3   0.000   -1.086  103.000 0.685
4   1.000   0.997   100.000 0.719
5   1.000   0.997   101.000 0.423
6   1.000   0.997   102.000 0.981
7   1.000   0.997   103.000 0.685
8   2.000   0.283   100.000 0.719
9   2.000   0.283   101.000 0.423
10  2.000   0.283   102.000 0.981
11  2.000   0.283   103.000 0.685

可复制的示例：

import pandas
import numpy

np.random.seed(123)

df1 = pd.DataFrame({'a': list(range(3)), 'b': np.random.randn(3)})
df2 = pd.DataFrame({'c': list(range(100, 104)), 'd': np.random.rand(4)})

df_new = pd.DataFrame()

for i in range(len(df1)):

    for j in range(len(df2)):

        obs = df1.iloc[i, :].append(df2.iloc[j, :])
        df_new = df_new.append(obs, ignore_index=True)

print(df1)
print(df2)
print(df_new)

如何在熊猫中将一个数据帧扩展为另一个数据帧（不考虑索引）？

0 个答案: