我有1000多个.txt文件,其中有两列,一列是日期,另一列是价格(用于股票),每个文件都以股票代码命名。我想找到一条最合适的数据线,以判断数据是正,负还是平坦。我认为可以找到最佳拟合线的斜率来做到这一点。有人知道我该怎么编码吗?
到目前为止,我有:
import numpy as np
import pandas as pd
import matplotlib as plt
import os as os
import seaborn as sns
from statistics import mean
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn import metrics
%matplotlib inline
filelist = os.listdir(r'InsertFilePath')
filepath = r'InsertFilePath'
dic1 = {}
#Uploads files to a Dictionary with filename(ticker) being the keys and the Data being the values
for file in filelist:
df = pd.read_csv(filepath + file,sep='\t')
dic1[file]= df
#renames Columns to Dates and Prices
for key,value in dic1.items():
value.rename(columns={value.columns[0]:'Dates',value.columns[1]:'Prices'},inplace=True)
x = df['Dates']
y = df['Prices']
LinearRegression().fit(x,y)
但是对于最后三行,由于日期以字符串形式出现,因此出现错误。我很累做int(df ['Dates]),但这也不起作用。我是python的新手,请原谅我。