ValueError:使用序列设置数组元素(LogisticRegression with Array based feature)

时间:2016-12-15 21:09:03

标签: numpy scikit-learn logistic-regression

提前感谢任何指导。我试图通过Logistic回归使用scikit-learn进行分类,其中X是截距,一个字段是心率数据,称为心率。在研究其他人也遇到此错误的基础上,我确保心率阵列的形状/大小都相同。

它在sklearn / utils / validation.py第382行获取值错误,在check_array中通过array = np.array完成数据框副本的行(array,dtype = dtype,order = order,copy = copy)。我怀疑我的阵列在记忆中是不连续的,这是什么造成了问题,但不确定......

以下是一些代码片段 - 它可以帮助解决这个问题:

    def get_training_set(self):
        training_set = []
        after_date = datetime.utcnow() - timedelta(weeks=8)
        before_date = datetime.utcnow() - timedelta(hours=12)
        activities = self.strava_client.get_activities(after=after_date, before=before_date)
        for act in activities:
            if act.has_heartrate:
                streams = self.strava_client.get_activity_streams(activity_id=act.id, types=['heartrate'])
                heartrate = np.array(list(filter(lambda x: x is not None, streams['heartrate'].data)))
                fixed_heartrate = np.pad(heartrate, (0, 15000 - len(heartrate)), 'constant')
                item = {'activity_type': self.classes.index(act.type),'heartrate': fixed_heartrate}
                training_set.append(item)
        return pd.DataFrame(training_set)

    def train(self):
        df = self.get_training_set()
        df['Intercept'] = np.ones((len(df),))
        y = df[['activity_type']]
        X = df[['Intercept', 'heartrate']]
        y = np.ravel(y)
        #
        model = LogisticRegression()
        self.debug('y={}'.format(y))
        model = model.fit(X,y)

适合发生异常......

提前感谢任何指导。

尊重,

麦克

从评论中复制以改进格式:

/python3.5/site-packages/sklearn/linear_model/logistic.py", line 1173, in 
    fit order="C") 
File "/python3.5/site-packages/sklearn/utils/validation.py", line 521, in 
    check_X_y ensure_min_features, warn_on_dtype, estimator) 
File "/lib/python3.5/site-packages/sklearn/utils/validation.py", line 382, in 
    check_array array = np.array(array, dtype=dtype, order=order, copy=copy) 
ValueError: setting an array element with a sequence

和另一条评论:

X和y看起来像这样:

X.shape=(29, 2) 
y.shape=(29,) 
X=[[1 array([74, 74, 77, ..., 0, 0, 0])] 
   [1 array([66, 67, 69, ..., 0, 0, 0])] 
   ...          
   [1 array([92, 92, 91, ..., 0, 0, 0])] 
   [1 array([79, 79, 79, ..., 0, 0, 0])]] 
y=[ 0 11 11 0 1 0 11 0 11 1 0 11 0 0 11 0 0 0 0 0 11 0 11 0 0 0 11 0 0]

1 个答案:

答案 0 :(得分:0)

如果改变train(),事情会更好吗?看起来像这样吗?

{"dados": [["id", "Nome", "Sigla", "Cidades"], ["id", "Nome", "Sigla", "Cidades"], ["id", "Nome", "Sigla", "Cidades"], ["id", "Nome", "Sigla", "Cidades"], ["id", "Nome", "Sigla", "Cidades"], ["id", "Nome", "Sigla", "Cidades"], ["id", "Nome", "Sigla", "Cidades"], ["id", "Nome", "Sigla", "Cidades"], ["id", "Nome", "Sigla", "Cidades"], ["id", "Nome", "Sigla", "Cidades"], ["id", "Nome", "Sigla", "Cidades"], ["id", "Nome", "Sigla", "Cidades"], ["id", "Nome", "Sigla", "Cidades"], ["id", "Nome", "Sigla", "Cidades"], ["id", "Nome", "Sigla", "Cidades"], ["id", "Nome", "Sigla", "Cidades"], ["id", "Nome", "Sigla", "Cidades"], ["id", "Nome", "Sigla", "Cidades"], ["id", "Nome", "Sigla", "Cidades"], ["id", "Nome", "Sigla", "Cidades"], ["id", "Nome", "Sigla", "Cidades"], ["id", "Nome", "Sigla", "Cidades"], ["id", "Nome", "Sigla", "Cidades"], ["id", "Nome", "Sigla", "Cidades"], ["id", "Nome", "Sigla", "Cidades"], ["id", "Nome", "Sigla", "Cidades"], ["id", "Nome", "Sigla", "Cidades"]], "erro": null}

(a)将生成正确长度的序列 (b)使用值返回numpy数组而不是另一个数据帧
(c)适合在现场进行