我有一个pandas数据框,我想每行制作许多副本,其中两列要有所不同。
我整理了一些可行的方法,但想了解如何更有效地执行此操作,并自动更新索引。
编辑:我需要它在现有数据框df
上工作,而不是从头开始构建数据框
例如使用此输入:
Index Time P1 P2
0 1 2020-06-01 1 1
我想要此输出(一行分别包含P1和P2):
Index Time P1 P2
1 2020-06-01 1 1
2 2020-06-01 2 1
3 2020-06-01 1 2
4 2020-06-01 2 2
尝试使它起作用:
import pandas as pd
import numpy as np
dsample = {'Index': [1],
'Time': ["2020-06-01"],
'P1': [1],
'P2': [1]
}
p1_range = np.arange(start=1, stop=3, step=1)
p2_range = np.arange(start=1, stop=3, step=1)
def get_variations(df, row):
i = 2
for p1 in p1_range:
for p2 in p2_range:
newrow = row.copy()
newrow['P1'] = p1
newrow['P2'] = p2
newrow['Index'] = i
df = df.append(newrow)
i = i + 1
return df
df = pd.DataFrame(data=dsample)
for index, row in df.iterrows():
df = get_variations(df, row)
编辑:基于以下Rob的回答,我已经放弃了:
import pandas as pd
import numpy as np
dsample = {'Index': [1],
'Time': [pd.to_datetime("2020-06-01")],
}
p1_range = np.arange(start=1, stop=4, step=1)
p2_range = np.arange(start=1, stop=4, step=1)
df_orig = pd.DataFrame(data=dsample)
a = np.array(np.meshgrid(p1_range,
p2_range)).reshape(2, -1)
df_combs = pd.DataFrame({"Time": np.full(len(a[0]), df_orig['Time']), "P1": a[0], "P2": a[1]})
df_new = pd.merge(df_orig,df_combs, on='Time', how='left')
print(df_new.to_string())
答案 0 :(得分:1)
我认为您的组合有误。 NumPy本身可以生成组合meshgrid
def perms(n):
a = np.array(np.meshgrid(np.arange(start=1, stop=n, step=1),
np.arange(start=1, stop=n, step=1))).reshape(2, -1)
dfp = pd.DataFrame({"Time":np.full(len(a[0]), pd.to_datetime("2020-06-01")), "P1":a[0], "P2":a[1]})
return dfp
df = pd.DataFrame({"col1":["a","b"], "col2":[30,40], "perms":[3,5]})
# simple case just want to merge on constant number of permutations
dfeasy = df.assign(foo=1).merge(perms(3).assign(foo=1), on="foo").drop("foo",1)
print(dfeasy.to_string())
# complex case - perms comes from existing df
dfp = pd.DataFrame()
for idx, row in df.iterrows():
dfp = pd.concat([dfp, df.loc[idx:,].assign(foo=1)\
.merge(perms(row["perms"]).assign(foo=1), on="foo").drop("foo",1)]).reset_index(drop=True)
print(dfp.to_string())
输出
col1 col2 perms Time P1 P2
0 a 30 3 2020-06-01 1 1
1 a 30 3 2020-06-01 2 1
2 a 30 3 2020-06-01 1 2
3 a 30 3 2020-06-01 2 2
4 b 40 5 2020-06-01 1 1
5 b 40 5 2020-06-01 2 1
6 b 40 5 2020-06-01 1 2
7 b 40 5 2020-06-01 2 2
col1 col2 perms Time P1 P2
0 a 30 3 2020-06-01 1 1
1 a 30 3 2020-06-01 2 1
2 a 30 3 2020-06-01 1 2
3 a 30 3 2020-06-01 2 2
4 b 40 5 2020-06-01 1 1
5 b 40 5 2020-06-01 2 1
6 b 40 5 2020-06-01 1 2
7 b 40 5 2020-06-01 2 2
8 b 40 5 2020-06-01 1 1
9 b 40 5 2020-06-01 2 1
10 b 40 5 2020-06-01 3 1
11 b 40 5 2020-06-01 4 1
12 b 40 5 2020-06-01 1 2
13 b 40 5 2020-06-01 2 2
14 b 40 5 2020-06-01 3 2
15 b 40 5 2020-06-01 4 2
16 b 40 5 2020-06-01 1 3
17 b 40 5 2020-06-01 2 3
18 b 40 5 2020-06-01 3 3
19 b 40 5 2020-06-01 4 3
20 b 40 5 2020-06-01 1 4
21 b 40 5 2020-06-01 2 4
22 b 40 5 2020-06-01 3 4
23 b 40 5 2020-06-01 4 4