我有一个pandas数据框,如:
for($k = 1; $k < $nb_nodes; $k++) $res[] = [$c["($bits,$k)"][0], $k];
我想创建一个&#34;梯子&#34;或者&#34;范围&#34;每行的成本为50美分增量,从当前成本以下0.50美元增加到当前成本之上0.50美元。我目前的代码类似于以下内容:
color cost temp
0 blue 12.0 80.4
1 red 8.1 81.2
2 pink 24.5 83.5
此代码将生成一个DataFrame,如:
incremented_prices = []
df['original_idx'] = df.index # To know it's original label
for row in df.iterrows():
current_price = row['cost']
more_costs = numpy.arange(current_price-1, current_price+1, step=0.5)
for cost in more_costs:
row_c = row.copy()
row_c['cost'] = cost
incremented_prices.append(row_c)
df_incremented = pandas.concat(incremented_prices)
在真正的问题中,我会将范围从 - $ 50.00到$ 50.00,我觉得这很慢,是否有更快的矢量化方式?
答案 0 :(得分:2)
您可以尝试使用Project Month
1 1/1/2015
1 2/1/2015
1 3/1/2015
2 2/1/2015
2 3/1/2015
2 4/1/2015
:
numpy.repeat
更新以获取更多列:
cost_steps = pd.np.arange(-0.5, 0.51, 0.5)
repeats = cost_steps.size
pd.DataFrame(dict(
color = pd.np.repeat(df.color.values, repeats),
# here is a vectorized method to calculate the costs with all steps added with broadcasting
cost = (df.cost.values[:, None] + cost_steps).ravel(),
temp = pd.np.repeat(df.temp.values, repeats),
original_idx = pd.np.repeat(df.index.values, repeats)
))
答案 1 :(得分:1)
这是一种基于NumPy初始化的方法 -
increments = 0.5*np.arange(-1,2) # Edit the increments here
names = np.append(df.columns, 'original_idx')
M,N = df.shape
vals = df.values
cost_col_idx = (names == 'cost').argmax()
n = len(increments)
shp = (M,n,N+1)
b = np.empty(shp,dtype=object)
b[...,:-1] = vals[:,None]
b[...,-1] = np.arange(M)[:,None]
b[...,cost_col_idx] = vals[:,cost_col_idx].astype(float)[:,None] + increments
b.shape = (-1,N+1)
df_out = pd.DataFrame(b, columns=names)
要使增量从-50
增加到+50
,增量为{{1}},请使用:
0.5
示例运行 -
increments = 0.5*np.arange(-100,101)