熊猫-获取具有不同列值组合的行

时间:2020-07-29 14:02:08

标签: python pandas numpy

我有一个pandas数据框,我想每行制作许多副本,其中两列要有所不同。

我整理了一些可行的方法,但想了解如何更有效地执行此操作,并自动更新索引。

编辑:我需要它在现有数据框df上工作,而不是从头开始构建数据框

例如使用此输入:

Index   Time    P1  P2  
0   1   2020-06-01  1   1

我想要此输出(一行分别包含P1和P2):

Index   Time    P1  P2
1   2020-06-01  1   1
2   2020-06-01  2   1
3   2020-06-01  1   2
4   2020-06-01  2   2

尝试使它起作用:

import pandas as pd
import numpy as np

dsample = {'Index': [1],
     'Time': ["2020-06-01"],
     'P1': [1],
    'P2': [1]
    }

p1_range = np.arange(start=1, stop=3, step=1)
p2_range = np.arange(start=1, stop=3, step=1)

def get_variations(df, row):
    i = 2
    for p1 in p1_range:
        for p2 in p2_range:
            newrow = row.copy()
            newrow['P1'] = p1
            newrow['P2'] = p2
            newrow['Index'] = i
            df = df.append(newrow)
            i = i + 1
    return df
            
df = pd.DataFrame(data=dsample)

for index, row in df.iterrows():
    df = get_variations(df, row)

编辑:基于以下Rob的回答,我已经放弃了:

import pandas as pd
import numpy as np

dsample = {'Index': [1],
           'Time': [pd.to_datetime("2020-06-01")],
           }

p1_range = np.arange(start=1, stop=4, step=1)
p2_range = np.arange(start=1, stop=4, step=1)

df_orig = pd.DataFrame(data=dsample)

a = np.array(np.meshgrid(p1_range,
                         p2_range)).reshape(2, -1)

df_combs = pd.DataFrame({"Time": np.full(len(a[0]), df_orig['Time']), "P1": a[0], "P2": a[1]})

df_new = pd.merge(df_orig,df_combs, on='Time', how='left')

print(df_new.to_string())

1 个答案:

答案 0 :(得分:1)

我认为您的组合有误。 NumPy本身可以生成组合meshgrid

def perms(n):
    a = np.array(np.meshgrid(np.arange(start=1, stop=n, step=1),
                np.arange(start=1, stop=n, step=1))).reshape(2, -1)
    dfp = pd.DataFrame({"Time":np.full(len(a[0]), pd.to_datetime("2020-06-01")), "P1":a[0], "P2":a[1]})
    return dfp

df = pd.DataFrame({"col1":["a","b"], "col2":[30,40], "perms":[3,5]})

# simple case just want to merge on constant number of permutations
dfeasy = df.assign(foo=1).merge(perms(3).assign(foo=1), on="foo").drop("foo",1)
print(dfeasy.to_string())

# complex case - perms comes from existing df
dfp = pd.DataFrame()
for idx, row in df.iterrows():
    dfp = pd.concat([dfp, df.loc[idx:,].assign(foo=1)\
                     .merge(perms(row["perms"]).assign(foo=1), on="foo").drop("foo",1)]).reset_index(drop=True)
print(dfp.to_string())

输出

  col1  col2  perms       Time  P1  P2
0    a    30      3 2020-06-01   1   1
1    a    30      3 2020-06-01   2   1
2    a    30      3 2020-06-01   1   2
3    a    30      3 2020-06-01   2   2
4    b    40      5 2020-06-01   1   1
5    b    40      5 2020-06-01   2   1
6    b    40      5 2020-06-01   1   2
7    b    40      5 2020-06-01   2   2
   col1  col2  perms       Time  P1  P2
0     a    30      3 2020-06-01   1   1
1     a    30      3 2020-06-01   2   1
2     a    30      3 2020-06-01   1   2
3     a    30      3 2020-06-01   2   2
4     b    40      5 2020-06-01   1   1
5     b    40      5 2020-06-01   2   1
6     b    40      5 2020-06-01   1   2
7     b    40      5 2020-06-01   2   2
8     b    40      5 2020-06-01   1   1
9     b    40      5 2020-06-01   2   1
10    b    40      5 2020-06-01   3   1
11    b    40      5 2020-06-01   4   1
12    b    40      5 2020-06-01   1   2
13    b    40      5 2020-06-01   2   2
14    b    40      5 2020-06-01   3   2
15    b    40      5 2020-06-01   4   2
16    b    40      5 2020-06-01   1   3
17    b    40      5 2020-06-01   2   3
18    b    40      5 2020-06-01   3   3
19    b    40      5 2020-06-01   4   3
20    b    40      5 2020-06-01   1   4
21    b    40      5 2020-06-01   2   4
22    b    40      5 2020-06-01   3   4
23    b    40      5 2020-06-01   4   4