我正在执行for循环以计算统计值。为此,我将列中的计算结果存储在不同的变量(D10,D50和D90)中。
然后我将它们存储在名为result的数组中
# First created an empty numpy array
result_array = np.array([])
for column in df:
# just computations you can ignore them
df = df.sort_values('Size')
cumul = df[column].cumsum()
suma_de_frecuencias = df[column].sum()
D10 = sum(cumul < 0.1 * float(suma_de_frecuencias))
D50 = sum(cumul < 0.5 * float(suma_de_frecuencias))
D90 = sum(cumul < 0.9 * float(suma_de_frecuencias))
# List of statistics values i am trying to get
D10 = df['Size'].iloc[D10]
D50 = df['Size'].iloc[D50]
D90 = df['Size'].iloc[D90]
# Storing values in an array
result = [D10, D50, D90]
# Appending each "result" in "result_array"
result_array = np.append(result_array, result)
但是当我尝试创建一个数据框时,代码结果是一个错误
dataset = pd.DataFrame(data=result_array[1:,1:],index=result_array[1:,65],column=result_array[0,1:])
它会在第40行中产生IndexError追溯,它适用于数据集
说:
IndexError:数组的索引过多
我想得到的是这样的:
"""
'D10' | 'D50' | 'D90'
0 | value | value | value <--- the first computated array "result"
1 | value | value | value <--- the first computated array "result"
2 | value | value | value <--- the first computated array "result"
3 | value | value | value
.
.
.
"""