我有一个熊猫数据框,其中包含date
和一些类似下面的值
原始数据:
list = [('2018-10-29', 6.1925), ('2018-10-29', 6.195), ('2018-10-29', 1.95833333333333),
('2018-10-29', 1.785), ('2018-10-29', 3.05), ('2018-10-29', 1.30666666666667),
('2018-10-29', 1.6325), ('2018-10-30', 1.765), ('2018-10-30', 1.265),
('2018-10-30', 2.1125), ('2018-10-30', 2.16714285714286), ('2018-10-30', 1.485),
('2018-10-30', 1.72), ('2018-10-30', 2.754), ('2018-10-30', 1.79666666666667),
('2018-10-30', 1.27833333333333), ('2018-10-30', 3.48), ('2018-10-30', 6.19),
('2018-10-30', 6.235), ('2018-10-30', 6.11857142857143), ('2018-10-30', 6.088),
('2018-10-30', 4.3), ('2018-10-30', 7.80666666666667),
('2018-10-30', 7.78333333333333), ('2018-10-30', 10.9766666666667),
('2018-10-30', 2.19), ('2018-10-30', 1.88)]
装入大熊猫后
df = pd.DataFrame(list)
0 1
0 2018-10-29 6.192500
1 2018-10-29 6.195000
2 2018-10-29 1.958333
3 2018-10-29 1.785000
4 2018-10-29 3.050000
5 2018-10-29 1.306667
6 2018-10-29 1.632500
7 2018-10-30 1.765000
8 2018-10-30 1.265000
9 2018-10-30 2.112500
10 2018-10-30 2.167143
11 2018-10-30 1.485000
12 2018-10-30 1.720000
13 2018-10-30 2.754000
14 2018-10-30 1.796667
15 2018-10-30 1.278333
16 2018-10-30 3.480000
17 2018-10-30 6.190000
18 2018-10-30 6.235000
19 2018-10-30 6.118571
20 2018-10-30 6.088000
21 2018-10-30 4.300000
22 2018-10-30 7.806667
23 2018-10-30 7.783333
24 2018-10-30 10.976667
25 2018-10-30 2.190000
26 2018-10-30 1.880000
这就是我加载数据框的方式
df = pd.DataFrame(list)
df[0] = pd.to_datetime(df[0], errors='coerce')
df.set_index(0, inplace=True)
现在,我想找到slope
。在互联网上进行研究后,我发现需要进行此操作才能获得slope
trend_coord = list(map(list, zip(df.index.strftime('%Y-%m-%d'), sm.tsa.seasonal_decompose(df.iloc[:,0].values).trend.interpolate(method='linear',axis=0).fillna(0).values)))
results = sm.OLS(np.asarray(sm.tsa.seasonal_decompose(df.iloc[:,0].values).trend.interpolate(method='linear', axis=0).fillna(0).values), sm.add_constant(np.array([i for i in range(len(trend_coord))])), missing='drop').fit()
slope = results.params[1]
print(slope)
但是我收到以下错误
Traceback (most recent call last):
File "/home/souvik/Music/UI_Server2/test35.py", line 11, in <module>
trend_coord = list(map(list, zip(df.index.strftime('%Y-%m-%d'), sm.tsa.seasonal_decompose(df.iloc[:,0].values).trend.interpolate(method='linear',axis=0).fillna(0).values)))
File "/home/souvik/django_test/webdev/lib/python3.5/site-packages/statsmodels/tsa/seasonal.py", line 127, in seasonal_decompose
raise ValueError("You must specify a freq or x must be a "
ValueError: You must specify a freq or x must be a pandas object with a timeseries index with a freq not set to None
现在,如果我将一个freq
参数添加到Season_decompose方法(例如
trend_coord = list(map(list, zip(df.index.strftime('%Y-%m-%d'), sm.tsa.seasonal_decompose(df.iloc[:,0].values, freq=1).trend.interpolate(method='linear',axis=0).fillna(0).values)))
然后我收到类似
的错误Traceback (most recent call last):
File "/home/souvik/Music/UI_Server2/test35.py", line 11, in <module>
trend_coord = list(map(list, zip(df.index.strftime('%Y-%m-%d'), sm.tsa.seasonal_decompose(df.iloc[:,0].values, freq=1).trend.interpolate(method='linear',axis=0).fillna(0).values)))
AttributeError: 'numpy.ndarray' object has no attribute 'interpolate'
但是,如果我摆脱了任何细粒度的数据,例如interpolate
等,并执行以下操作
trend_coord = sm.tsa.seasonal_decompose(df.iloc[:,0].values, freq=1, model='additive').trend
results = sm.OLS(np.asarray(trend_coord),
sm.add_constant(np.array([i for i in range(len(trend_coord))])), missing='drop').fit()
slope = results.params[1]
print(">>>>>>>>>>>>>>>> slope", slope)
然后我得到slope
的{{1}}值。
但是我不确定这是否是找出0.13668559218559242
的正确方法,甚至值是否正确。
是否有更好的方法找出slope
?
答案 0 :(得分:0)
您可以使用$ nmap -sT 0.0.0.0
Nmap scan report for 0.0.0.0
PORT STATE SERVICE
631/tcp open ipp
8080/tcp open http-proxy
将每个日期映射到一个整数,并使用datetime.toordinal
将线性回归模型拟合到数据上,以得到斜率,例如:
sklearn.linear_model
答案 1 :(得分:0)
我将参与Franco的回答,但您不需要sklearn。您可以使用scipy轻松完成此操作。
import datetime as dt
from scipy import stats
df = pd.DataFrame(list, columns=['date', 'value'])
df.date =pd.to_datetime(df.date)
df['date_ordinal'] = pd.to_datetime(df['date']).map(dt.datetime.toordinal)
slope, intercept, r_value, p_value, std_err = stats.linregress(df['date_ordinal'], df['value'])
slope
Out[114]: 0.80959404761905
答案 2 :(得分:0)
对于像这样的简单情况,要获取线性回归线的斜率和截距(y =截距+斜率* x),您需要使用numpy polyfit()方法。我的解释与下面的代码一致。
# You should only need numpy and pandas
import numpy as np
import pandas as pd
# Now your list
list = [('2018-10-29', 6.1925), ('2018-10-29', 6.195), ('2018-10-29', 1.95833333333333),
('2018-10-29', 1.785), ('2018-10-29', 3.05), ('2018-10-29', 1.30666666666667),
('2018-10-29', 1.6325), ('2018-10-30', 1.765), ('2018-10-30', 1.265),
('2018-10-30', 2.1125), ('2018-10-30', 2.16714285714286), ('2018-10-30', 1.485),
('2018-10-30', 1.72), ('2018-10-30', 2.754), ('2018-10-30', 1.79666666666667),
('2018-10-30', 1.27833333333333), ('2018-10-30', 3.48), ('2018-10-30', 6.19),
('2018-10-30', 6.235), ('2018-10-30', 6.11857142857143), ('2018-10-30', 6.088),
('2018-10-30', 4.3), ('2018-10-30', 7.80666666666667),
('2018-10-30', 7.78333333333333), ('2018-10-30', 10.9766666666667),
('2018-10-30', 2.19), ('2018-10-30', 1.88)]
# Create a single pandas DataFrame
df = pd.DataFrame(list)
# Make it into a Time Series with 'date' and 'value' columns
ts = pd.DataFrame(list, columns=['date', 'value'])
#print it to check it
ts.head(10)
# Now separate it into x and y lists
x = ts['date']
y = ts['value'].astype(float)
# Create a sequance of integers from 0 to x.size to use in np.polyfit() call
x_seq = np.arange(x.size) # should give you [ 0 1 2 3 4 ... 26]
# call numpy polyfit() method with x_seq, y
fit = np.polyfit(x_seq, y, 1)
fit_fn = np.poly1d(fit)
print('Slope = ', fit[0], ", ","Intercept = ", fit[1])
print(fit_fn)
斜率= 0.1366855921855925,截距= 1.9827865961199274
0.1367 x + 1.983