Question

我正在尝试编写一些非常简单的代码，基本上我正在做的是读取两列时间序列数据（对应于稍微不同的时间段），并循环通过第二列的不同％权重连续时间箱中的数据列。但是，当我运行这个循环时，由于某种原因原始数据帧（具体地说，df ['EST']在某种程度上被这一行改变了：

X_new[j-1]=Wt*X_temp[j-1]+(1-Wt)*X_temp[j]

我将其缩小到这行代码，因为当我消除它时，它不再对初始数据帧进行更改。我不明白这条线如何改变原始数据帧。

我的完整代码：

import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression

import csv
import os
os.chdir("C://Users/XXX")
raw_data = open('correlation test.csv')
%matplotlib inline
import matplotlib.pyplot as plt
df=pd.read_csv(raw_data)

Y=df['%<30'][1:].reshape(-1,1)
X_new=df['EST'][1:].reshape(-1,1)

X_temp=df['EST'][1:].reshape(-1,1)

Wt=0
Best_Wt=Wt
Best_Score=1
for i in range(1,100):
    for j in range(1,df.shape[0]-1):
        X_new[j-1]=Wt*X_temp[j-1]+(1-Wt)*X_temp[j]
        asdf=0
    RR=LinearRegression()
    RR.fit(X_new,Y)
    New_Score=np.mean(np.abs((RR.predict(X_new)-Y)))
    if New_Score<Best_Score:
        Best_Score=New_Score
        Best_Wt=Wt
        print('New Best Score:',Best_Score)
        print('New Best Weight:',Best_Wt)
    Wt=Wt+0.01

它从中提取的文件是两列百分比，第一列标记为'％＆lt; 30'，第二列标记为'EST' 提前感谢您的帮助！

我的代码如何修改不应修改的内容而难倒

0 个答案: