我被困在以下几行
import quandl,math
import pandas as pd
import numpy as np
from sklearn import preprocessing ,cross_validation , svm
from sklearn.linear_model import LinearRegression
df = quandl.get('WIKI/GOOGL')
df = df[['Adj. Open','Adj. High','Adj. Low','Adj. Close','Adj. Volume']]
df['HL_PCT'] = (df["Adj. High"] - df['Adj. Close'])/df['Adj. Close'] * 100
df['PCT_CHANGE'] = (df["Adj. Close"] - df['Adj. Open'])/df['Adj. Open'] * 100
df = df[['Adj. Close','HL_PCT','PCT_CHANGE','Adj. Open']]
forecast_col = 'Adj. Close'
df.fillna(-99999,inplace = True)
forecast_out = int(math.ceil(.1*len(df)))
df['label'] = df[forecast_col].shift(-forecast_out)
print df.head()
我无法理解df [forecast_col] .shift(-forecast_out)
的含义请解释一下该命令及其作用??
答案 0 :(得分:15)
pandas.Dataframe的移位函数使用可选的时间频率将索引移动所需的句点数。有关班次功能的更多信息,请参阅此link。
以下是要移位的列值的小例子:
function CloseModal(){
$("#myModal").modal('Hide');
}
下面是移位前的列值
import pandas as pd
import numpy as np
df = pd.DataFrame({"date": ["2000-01-03", "2000-01-03", "2000-03-05", "2000-01-03", "2000-03-05",
"2000-03-05", "2000-07-03", "2000-01-03", "2000-07-03", "2000-07-03"],
"variable": ["A", "A", "A", "B", "B", "B", "C", "C", "C", "D"],
"no": [1, 2.2, 3.5, 1.5, 1.5, 1.2, 1.3, 1.1, 2, 3],
"value": [0.469112, -0.282863, -1.509059, -1.135632, 1.212112, -0.173215,
0.119209, -1.044236, -0.861849, None]})
输出
df['value']
使用移位函数值会根据给定的时间段移动
例如使用带正整数的shift将行值向下移动:
0 0.469112
1 -0.282863
2 -1.509059
3 -1.135632
4 1.212112
5 -0.173215
6 0.119209
7 -1.044236
8 -0.861849
9 NaN
输出
df['value'].shift(1)
使用带负整数的shift将行值向上移动:
0 NaN
1 0.469112
2 -0.282863
3 -1.509059
4 -1.135632
5 1.212112
6 -0.173215
7 0.119209
8 -1.044236
9 -0.861849
Name: value, dtype: float64
输出
df['value'].shift(-1)
答案 1 :(得分:0)
此处的代码希望输入将来的值,对'Adj。做一个预测。收盘价 通过将每一行的数据帧长度值的下10%放入df ['label']。
forecast_out = int(math.ceil(.1*len(df)))
df['label'] = df[forecast_col].shift(-forecast_out)
如果打印df.tail(),您将获得NaN值。