数据框的转换是什么意思?

时间:2017-06-21 12:06:10

标签: python pandas quandl

我被困在以下几行

import quandl,math
import pandas as pd
import numpy as np
from  sklearn import preprocessing ,cross_validation , svm
from sklearn.linear_model import  LinearRegression


df = quandl.get('WIKI/GOOGL')




df = df[['Adj. Open','Adj. High','Adj. Low','Adj. Close','Adj. Volume']]

df['HL_PCT'] = (df["Adj. High"] - df['Adj. Close'])/df['Adj. Close'] * 100
df['PCT_CHANGE'] = (df["Adj. Close"] - df['Adj. Open'])/df['Adj. Open'] * 100

df = df[['Adj. Close','HL_PCT','PCT_CHANGE','Adj. Open']]

forecast_col = 'Adj. Close'

df.fillna(-99999,inplace = True)

forecast_out = int(math.ceil(.1*len(df)))

df['label'] = df[forecast_col].shift(-forecast_out)
print df.head()

我无法理解df [forecast_col] .shift(-forecast_out)

的含义

请解释一下该命令及其作用??

2 个答案:

答案 0 :(得分:15)

pandas.Dataframe的移位函数使用可选的时间频率将索引移动所需的句点数。有关班次功能的更多信息,请参阅此link

以下是要移位的列值的小例子:

function CloseModal(){
    $("#myModal").modal('Hide');
}

下面是移位前的列值

import pandas as pd 
import numpy as np
df = pd.DataFrame({"date": ["2000-01-03", "2000-01-03", "2000-03-05", "2000-01-03", "2000-03-05",
                        "2000-03-05", "2000-07-03", "2000-01-03", "2000-07-03", "2000-07-03"],
               "variable": ["A", "A", "A", "B", "B", "B", "C", "C", "C", "D"],
               "no": [1, 2.2, 3.5, 1.5, 1.5, 1.2, 1.3, 1.1, 2, 3],
               "value": [0.469112, -0.282863, -1.509059, -1.135632, 1.212112, -0.173215,
                         0.119209, -1.044236, -0.861849, None]})

输出

df['value']

使用移位函数值会根据给定的时间段移动

例如使用带正整数的shift将行值向下移动:

0    0.469112
1   -0.282863
2   -1.509059
3   -1.135632
4    1.212112
5   -0.173215
6    0.119209
7   -1.044236
8   -0.861849
9         NaN

输出

df['value'].shift(1)

使用带负整数的shift将行值向上移动:

0         NaN
1    0.469112
2   -0.282863
3   -1.509059
4   -1.135632
5    1.212112
6   -0.173215
7    0.119209
8   -1.044236
9   -0.861849
Name: value, dtype: float64

输出

df['value'].shift(-1)

答案 1 :(得分:0)

此处的代码希望输入将来的值,对'Adj。做一个预测。收盘价 通过将每一行的数据帧长度值的下10%放入df ['label']。

forecast_out = int(math.ceil(.1*len(df)))
df['label'] = df[forecast_col].shift(-forecast_out)

如果打印df.tail(),您将获得NaN值。