适用于Python数据集的最佳拟合/简单线性回归

时间:2020-02-21 17:41:08

标签: python pandas numpy linear-regression finance

我有1000多个.txt文件,其中有两列,一列是日期,另一列是价格(用于股票),每个文件都以股票代码命名。我想找到一条最合适的数据线,以判断数据是正,负还是平坦。我认为可以找到最佳拟合线的斜率来做到这一点。有人知道我该怎么编码吗?

到目前为止,我有:

import numpy as np
import pandas as pd
import matplotlib as plt
import os as os
import seaborn as sns
from statistics import mean
from sklearn.model_selection import train_test_split 
from sklearn.linear_model import LinearRegression
from sklearn import metrics
%matplotlib inline

filelist = os.listdir(r'InsertFilePath')

filepath = r'InsertFilePath'

dic1 = {}
#Uploads files to a Dictionary with filename(ticker) being the keys and the Data being the values
for file in filelist:
    df = pd.read_csv(filepath + file,sep='\t')
    dic1[file]= df


#renames Columns to Dates and Prices
for key,value in dic1.items():
     value.rename(columns={value.columns[0]:'Dates',value.columns[1]:'Prices'},inplace=True)

x = df['Dates']
y = df['Prices']
LinearRegression().fit(x,y)

但是对于最后三行,由于日期以字符串形式出现,因此出现错误。我很累做int(df ['Dates]),但这也不起作用。我是python的新手,请原谅我。

0 个答案:

没有答案