Sklearn Error,4 dim的数组。估计量< = 2

时间:2016-05-21 09:26:27

标签: python-3.x pandas dataframe scikit-learn

我一直尝试通过panda从yahoo finance导入数据,然后通过.as_matrix()将其转换为数组,然后当我将数据输入到classifer进行训练时,它给了我一个错误。

ValueError: Found array with dim 4. Estimator expected <= 2.

以下是我的代码:

from sklearn import tree
import pandas as pd
import pandas_datareader.data as web

df = web.DataReader('goog', 'yahoo', start='2012-5-1', end='2016-5-20')

close_price = df[['Close']]

ma_50 = (pd.rolling_mean(close_price, window=50))
ma_100 = (pd.rolling_mean(close_price, window=100))
ma_200 = (pd.rolling_mean(close_price, window=200))

#adding buys and sell based on the values
df['B/S']= (df['Close'].diff() < 0).astype(int)
close_buy = df[['Close']+['B/S']]
closing = df[['Close']].as_matrix()
buy_sell = df[['B/S']]


close_buy = pd.DataFrame.dropna(close_buy, 0, 'any')
ma_50 = pd.DataFrame.dropna(ma_50, 0, 'any')
ma_100 = pd.DataFrame.dropna(ma_100, 0, 'any')
ma_200 = pd.DataFrame.dropna(ma_200, 0, 'any')

close_buy = (df.loc['2013-02-15':'2016-05-21']).as_matrix()
ma_50 = (df.loc['2013-02-15':'2016-05-21']).as_matrix()
ma_100 = (df.loc['2013-02-15':'2016-05-21']).as_matrix()
ma_200 = (df.loc['2013-02-15':'2016-05-21']).as_matrix()
buy_sell = (df.loc['2013-02-15':'2016-05-21']).as_matrix

print(ma_100)
clf = tree.DecisionTreeClassifier()
x = [[close_buy,ma_50,ma_100,ma_200]]
y = [buy_sell]

clf.fit(x,y)

1 个答案:

答案 0 :(得分:1)

我发现了一些需要修复的错误/事情。

  1. 缺少parantheses buy_sell = (df.loc['2013-02-15':'2016-05-21']).as_matrix
  2. [[close_buy,ma_50,ma_100,ma_200]]可以为您提供4个维度。相反,我使用np.concatenate来获取数组列表,并将它们以长度方式或宽度方式相互附加。参数axis=1指定宽度。这样做是使x成为822 x 28矩阵的822个28个特征的观测值。如果这不是你想要的,那么显然我没有达到标准。但这些维度与您的y
  3. 排列在一起

    相反:

    from sklearn import tree
    import pandas as pd
    import pandas_datareader.data as web
    
    df = web.DataReader('goog', 'yahoo', start='2012-5-1', end='2016-5-20')
    
    close_price = df[['Close']]
    
    ma_50 = (pd.rolling_mean(close_price, window=50))
    ma_100 = (pd.rolling_mean(close_price, window=100))
    ma_200 = (pd.rolling_mean(close_price, window=200))
    
    #adding buys and sell based on the values
    df['B/S']= (df['Close'].diff() < 0).astype(int)
    close_buy = df[['Close']+['B/S']]
    closing = df[['Close']].as_matrix()
    buy_sell = df[['B/S']]
    
    
    close_buy = pd.DataFrame.dropna(close_buy, 0, 'any')
    ma_50 = pd.DataFrame.dropna(ma_50, 0, 'any')
    ma_100 = pd.DataFrame.dropna(ma_100, 0, 'any')
    ma_200 = pd.DataFrame.dropna(ma_200, 0, 'any')
    
    close_buy = (df.loc['2013-02-15':'2016-05-21']).as_matrix()
    ma_50 = (df.loc['2013-02-15':'2016-05-21']).as_matrix()
    ma_100 = (df.loc['2013-02-15':'2016-05-21']).as_matrix()
    ma_200 = (df.loc['2013-02-15':'2016-05-21']).as_matrix()
    buy_sell = (df.loc['2013-02-15':'2016-05-21']).as_matrix()  # Fixed
    
    print(ma_100)
    clf = tree.DecisionTreeClassifier()
    x = np.concatenate([close_buy,ma_50,ma_100,ma_200], axis=1)  # Fixed
    y = buy_sell  # Brackets not necessary... I don't think
    
    clf.fit(x,y)
    

    这适合我:

    DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,
                max_features=None, max_leaf_nodes=None, min_samples_leaf=1,
                min_samples_split=2, min_weight_fraction_leaf=0.0,
                random_state=None, splitter='best')