Pandas / Numpy:创造阶梯的最快方式?

时间:2017-04-14 16:25:22

标签: python pandas numpy dataframe vectorization

我有一个pandas数据框,如:

for($k = 1; $k < $nb_nodes; $k++) $res[] = [$c["($bits,$k)"][0], $k];

我想创建一个&#34;梯子&#34;或者&#34;范围&#34;每行的成本为50美分增量,从当前成本以下0.50美元增加到当前成本之上0.50美元。我目前的代码类似于以下内容:

    color     cost    temp
0   blue      12.0    80.4   
1    red       8.1    81.2 
2   pink      24.5    83.5

此代码将生成一个DataFrame,如:

incremented_prices = []

df['original_idx'] = df.index # To know it's original label

for row in df.iterrows():
    current_price = row['cost']
    more_costs    = numpy.arange(current_price-1, current_price+1, step=0.5)

    for cost in more_costs:
        row_c = row.copy()
        row_c['cost'] = cost
        incremented_prices.append(row_c)

df_incremented = pandas.concat(incremented_prices)

在真正的问题中,我会将范围从 - $ 50.00到$ 50.00,我觉得这很慢,是否有更快的矢量化方式?

2 个答案:

答案 0 :(得分:2)

您可以尝试使用Project Month 1 1/1/2015 1 2/1/2015 1 3/1/2015 2 2/1/2015 2 3/1/2015 2 4/1/2015

重新创建数据框
numpy.repeat

enter image description here

更新以获取更多列:

cost_steps = pd.np.arange(-0.5, 0.51, 0.5)
repeats = cost_steps.size   

pd.DataFrame(dict(
    color = pd.np.repeat(df.color.values, repeats),
    # here is a vectorized method to calculate the costs with all steps added with broadcasting
    cost = (df.cost.values[:, None] + cost_steps).ravel(),
    temp = pd.np.repeat(df.temp.values, repeats),
    original_idx = pd.np.repeat(df.index.values, repeats)
    ))

enter image description here

答案 1 :(得分:1)

这是一种基于NumPy初始化的方法 -

increments = 0.5*np.arange(-1,2) # Edit the increments here

names = np.append(df.columns, 'original_idx')

M,N = df.shape
vals = df.values

cost_col_idx = (names == 'cost').argmax()

n = len(increments)
shp = (M,n,N+1)
b = np.empty(shp,dtype=object)
b[...,:-1] = vals[:,None]
b[...,-1] = np.arange(M)[:,None]
b[...,cost_col_idx] = vals[:,cost_col_idx].astype(float)[:,None] + increments
b.shape = (-1,N+1)
df_out = pd.DataFrame(b, columns=names)

要使增量从-50增加到+50,增量为{​​{1}},请使用:

0.5

示例运行 -

increments = 0.5*np.arange(-100,101)