Question

我有一个pandas数据框，如：

for($k = 1; $k < $nb_nodes; $k++) $res[] = [$c["($bits,$k)"][0], $k];

我想创建一个＆＃34;梯子＆＃34;或者＆＃34;范围＆＃34;每行的成本为50美分增量，从当前成本以下0.50美元增加到当前成本之上0.50美元。我目前的代码类似于以下内容：

    color     cost    temp
0   blue      12.0    80.4   
1    red       8.1    81.2 
2   pink      24.5    83.5

此代码将生成一个DataFrame，如：

incremented_prices = []

df['original_idx'] = df.index # To know it's original label

for row in df.iterrows():
    current_price = row['cost']
    more_costs    = numpy.arange(current_price-1, current_price+1, step=0.5)

    for cost in more_costs:
        row_c = row.copy()
        row_c['cost'] = cost
        incremented_prices.append(row_c)

df_incremented = pandas.concat(incremented_prices)

在真正的问题中，我会将范围从 - $ 50.00到$ 50.00，我觉得这很慢，是否有更快的矢量化方式？

Answer 1

您可以尝试使用Project Month 1 1/1/2015 1 2/1/2015 1 3/1/2015 2 2/1/2015 2 3/1/2015 2 4/1/2015：

重新创建数据框

numpy.repeat

更新以获取更多列：

cost_steps = pd.np.arange(-0.5, 0.51, 0.5)
repeats = cost_steps.size   

pd.DataFrame(dict(
    color = pd.np.repeat(df.color.values, repeats),
    # here is a vectorized method to calculate the costs with all steps added with broadcasting
    cost = (df.cost.values[:, None] + cost_steps).ravel(),
    temp = pd.np.repeat(df.temp.values, repeats),
    original_idx = pd.np.repeat(df.index.values, repeats)
    ))

Answer 2

这是一种基于NumPy初始化的方法 -

increments = 0.5*np.arange(-1,2) # Edit the increments here

names = np.append(df.columns, 'original_idx')

M,N = df.shape
vals = df.values

cost_col_idx = (names == 'cost').argmax()

n = len(increments)
shp = (M,n,N+1)
b = np.empty(shp,dtype=object)
b[...,:-1] = vals[:,None]
b[...,-1] = np.arange(M)[:,None]
b[...,cost_col_idx] = vals[:,cost_col_idx].astype(float)[:,None] + increments
b.shape = (-1,N+1)
df_out = pd.DataFrame(b, columns=names)

要使增量从-50增加到+50，增量为{{1}}，请使用：

0.5

示例运行 -

increments = 0.5*np.arange(-100,101)

Pandas / Numpy：创造阶梯的最快方式？

2 个答案: