我正在尝试在预测值和实际值之间进行比较。
from sklearn import linear_model
reg = linear_model.LinearRegression()
reg.fit(df[['Op1', 'Op2', 'S2', 'S3', 'S4', 'S7', 'S8', 'S9', 'S11', 'S12','S13', 'S14', 'S15', 'S17', 'S20', 'S21']], df.unit)
predicted = []
actual = []
for i in range(1,len(df.unit.unique())):
xp = df[(df.unit == i) & (df.cycles == len(df[df.unit == i].cycles))]
xa = xp.cycles.values
xp = xp.values[0,2:].reshape(1,-2)
predicted.append(reg.predict(xp))
actual.append(xa)
并显示数据框:
data = {'Actual cycles': actual, 'Predicted cycles': predicted }
df_2 = pd.DataFrame(data)
df_2.head()
我将得到一个输出:
Actual cycles Predicted cycles
0 [192] [56.7530579842869]
1 [287] [50.76877712361329]
2 [179] [42.72575900074571]
3 [189] [42.876506912637524]
4 [269] [47.40087182743173]
忽略相距很远的值,如何删除数据框中的方括号?有写我代码的更整洁的方法吗?谢谢!
答案 0 :(得分:1)
print(df_2)
Actualcycles Predictedcycles
0 [192] [56.7530579842869]
1 [287] [50.76877712361329]
2 [179] [42.72575900074571]
3 [189] [42.876506912637524]
4 [269] [47.40087182743173]
df=df_2.apply(lambda x:x.str.strip('[]'))
print(df)
Actualcycles Predictedcycles
0 192 56.7530579842869
1 287 50.76877712361329
2 179 42.72575900074571
3 189 42.876506912637524
4 269 47.40087182743173
答案 1 :(得分:0)
以下是带有括号的“周期”列的最小示例:
import pandas as pd
df = pd.DataFrame({
'cycles' : [[192], [287], [179], [189], [269]]
})
此代码为您提供了没有括号的列:
df['cycles'] = df['cycles'].str[0]
输出看起来像这样:
print(df)
cycles
0 192
1 287
2 179
3 189
4 269