Question

提前感谢任何指导。我试图通过Logistic回归使用scikit-learn进行分类，其中X是截距，一个字段是心率数据，称为心率。在研究其他人也遇到此错误的基础上，我确保心率阵列的形状/大小都相同。

它在sklearn / utils / validation.py第382行获取值错误，在check_array中通过array = np.array完成数据框副本的行（array，dtype = dtype，order = order，copy = copy）。我怀疑我的阵列在记忆中是不连续的，这是什么造成了问题，但不确定......

以下是一些代码片段 - 它可以帮助解决这个问题：

    def get_training_set(self):
        training_set = []
        after_date = datetime.utcnow() - timedelta(weeks=8)
        before_date = datetime.utcnow() - timedelta(hours=12)
        activities = self.strava_client.get_activities(after=after_date, before=before_date)
        for act in activities:
            if act.has_heartrate:
                streams = self.strava_client.get_activity_streams(activity_id=act.id, types=['heartrate'])
                heartrate = np.array(list(filter(lambda x: x is not None, streams['heartrate'].data)))
                fixed_heartrate = np.pad(heartrate, (0, 15000 - len(heartrate)), 'constant')
                item = {'activity_type': self.classes.index(act.type),'heartrate': fixed_heartrate}
                training_set.append(item)
        return pd.DataFrame(training_set)

    def train(self):
        df = self.get_training_set()
        df['Intercept'] = np.ones((len(df),))
        y = df[['activity_type']]
        X = df[['Intercept', 'heartrate']]
        y = np.ravel(y)
        #
        model = LogisticRegression()
        self.debug('y={}'.format(y))
        model = model.fit(X,y)

适合发生异常......

提前感谢任何指导。

尊重，

麦克

从评论中复制以改进格式：

/python3.5/site-packages/sklearn/linear_model/logistic.py", line 1173, in 
    fit order="C") 
File "/python3.5/site-packages/sklearn/utils/validation.py", line 521, in 
    check_X_y ensure_min_features, warn_on_dtype, estimator) 
File "/lib/python3.5/site-packages/sklearn/utils/validation.py", line 382, in 
    check_array array = np.array(array, dtype=dtype, order=order, copy=copy) 
ValueError: setting an array element with a sequence

和另一条评论：

X和y看起来像这样：

X.shape=(29, 2) 
y.shape=(29,) 
X=[[1 array([74, 74, 77, ..., 0, 0, 0])] 
   [1 array([66, 67, 69, ..., 0, 0, 0])] 
   ...          
   [1 array([92, 92, 91, ..., 0, 0, 0])] 
   [1 array([79, 79, 79, ..., 0, 0, 0])]] 
y=[ 0 11 11 0 1 0 11 0 11 1 0 11 0 0 11 0 0 0 0 0 11 0 11 0 0 0 11 0 0]

Answer 1

如果改变train（），事情会更好吗？看起来像这样吗？

{"dados": [["id", "Nome", "Sigla", "Cidades"], ["id", "Nome", "Sigla", "Cidades"], ["id", "Nome", "Sigla", "Cidades"], ["id", "Nome", "Sigla", "Cidades"], ["id", "Nome", "Sigla", "Cidades"], ["id", "Nome", "Sigla", "Cidades"], ["id", "Nome", "Sigla", "Cidades"], ["id", "Nome", "Sigla", "Cidades"], ["id", "Nome", "Sigla", "Cidades"], ["id", "Nome", "Sigla", "Cidades"], ["id", "Nome", "Sigla", "Cidades"], ["id", "Nome", "Sigla", "Cidades"], ["id", "Nome", "Sigla", "Cidades"], ["id", "Nome", "Sigla", "Cidades"], ["id", "Nome", "Sigla", "Cidades"], ["id", "Nome", "Sigla", "Cidades"], ["id", "Nome", "Sigla", "Cidades"], ["id", "Nome", "Sigla", "Cidades"], ["id", "Nome", "Sigla", "Cidades"], ["id", "Nome", "Sigla", "Cidades"], ["id", "Nome", "Sigla", "Cidades"], ["id", "Nome", "Sigla", "Cidades"], ["id", "Nome", "Sigla", "Cidades"], ["id", "Nome", "Sigla", "Cidades"], ["id", "Nome", "Sigla", "Cidades"], ["id", "Nome", "Sigla", "Cidades"], ["id", "Nome", "Sigla", "Cidades"]], "erro": null}

（a）将生成正确长度的序列（b）使用值返回numpy数组而不是另一个数据帧
（c）适合在现场进行

ValueError：使用序列设置数组元素（LogisticRegression with Array based feature）

1 个答案: