获得ValueError:endog和exog的索引不对齐

时间:2016-05-10 17:12:33

标签: python-3.x pandas statsmodels

当我使用FOR循环运行迭代来构建多个模型时,我遇到了上述错误。前两个具有相似数据集的模型构建良好。在构建第三个模型时,我收到此错误。抛出错误的代码是当我使用python的Statsmodel包调用sm.logit()时:

y = y_mort.convert_objects(convert_numeric=True)

#Building Logistic model_LSVC
print("Shape of y:", y.shape, " &&Shape of X_selected_lsvc:", X.shape)
print("y values:",y.head())
logit = sm.Logit(y,X,missing='drop') 

出现的错误:

Shape of y: (9018,)  &&Shape of X_selected_lsvc: (9018, 59)
y values: 0    0
1    1
2    0
3    0
4    0
Name: mort, dtype: int64
ValueError                                Traceback (most recent call last)
<ipython-input-8-fec746e2ee99> in <module>()
    160     print("Shape of y:", y.shape, " &&Shape of X_selected_lsvc:", X.shape)
    161     print("y values:",y.head())
--> 162     logit = sm.Logit(y,X,missing='drop')
    163     # fit the model
    164     est = logit.fit(method='cg')

D:\Anaconda3\lib\site-packages\statsmodels\discrete\discrete_model.py in __init__(self, endog, exog, **kwargs)
    399 
    400     def __init__(self, endog, exog, **kwargs):
--> 401         super(BinaryModel, self).__init__(endog, exog, **kwargs)
    402         if (self.__class__.__name__ != 'MNLogit' and
    403                 not np.all((self.endog >= 0) & (self.endog <= 1))):

D:\Anaconda3\lib\site-packages\statsmodels\discrete\discrete_model.py in __init__(self, endog, exog, **kwargs)
    152     """
    153     def __init__(self, endog, exog, **kwargs):
--> 154         super(DiscreteModel, self).__init__(endog, exog, **kwargs)
    155         self.raise_on_perfect_prediction = True
    156 

D:\Anaconda3\lib\site-packages\statsmodels\base\model.py in __init__(self, endog, exog, **kwargs)
    184 
    185     def __init__(self, endog, exog=None, **kwargs):
--> 186         super(LikelihoodModel, self).__init__(endog, exog, **kwargs)
    187         self.initialize()
    188 

D:\Anaconda3\lib\site-packages\statsmodels\base\model.py in __init__(self, endog, exog, **kwargs)
     58         hasconst = kwargs.pop('hasconst', None)
     59         self.data = self._handle_data(endog, exog, missing, hasconst,
---> 60                                       **kwargs)
     61         self.k_constant = self.data.k_constant
     62         self.exog = self.data.exog

D:\Anaconda3\lib\site-packages\statsmodels\base\model.py in _handle_data(self, endog, exog, missing, hasconst, **kwargs)
     82 
     83     def _handle_data(self, endog, exog, missing, hasconst, **kwargs):
---> 84         data = handle_data(endog, exog, missing, hasconst, **kwargs)
     85         # kwargs arrays could have changed, easier to just attach here
     86         for key in kwargs:

D:\Anaconda3\lib\site-packages\statsmodels\base\data.py in handle_data(endog, exog, missing, hasconst, **kwargs)
    564     klass = handle_data_class_factory(endog, exog)
    565     return klass(endog, exog=exog, missing=missing, hasconst=hasconst,
--> 566                  **kwargs)

D:\Anaconda3\lib\site-packages\statsmodels\base\data.py in __init__(self, endog, exog, missing, hasconst, **kwargs)
     74         # this has side-effects, attaches k_constant and const_idx
     75         self._handle_constant(hasconst)
---> 76         self._check_integrity()
     77         self._cache = resettable_cache()
     78 

D:\Anaconda3\lib\site-packages\statsmodels\base\data.py in _check_integrity(self)
    450                 (hasattr(endog, 'index') and hasattr(exog, 'index')) and
    451                 not self.orig_endog.index.equals(self.orig_exog.index)):
--> 452             raise ValueError("The indices for endog and exog are not aligned")
    453         super(PandasData, self)._check_integrity()
    454 

ValueError: The indices for endog and exog are not aligned

y矩阵和X矩阵具有(9018,),(9018,59)的形状。因此,不会出现依赖变量和自变量的任何不匹配。有什么想法吗?

6 个答案:

答案 0 :(得分:6)

尝试将 y 转换为 sm.Logit()行之前的列表。

y = list(y)

答案 1 :(得分:3)

该错误消息表明您拥有不同形状的Endog和Exog。 这是python中的常见错误,可以通过在因变量上使用'reshape'函数使其与自变量的形状对齐来轻松解决。

.horizontal {
        -moz-transform: scaleX(-1);
        -o-transform: scaleX(-1);
        -webkit-transform: scaleX(-1);
        transform: scaleX(-1);
        filter: FlipH;
        -ms-filter: "FlipH";
}

以上各行表示:- 我们提供的列为1,但行的数目为未知,即,单列的行数与X相同。

让我们举个例子:-

y_train.values.reshape(-1,1)

现在,我们将在此数组上使用reshape(-1,1)函数。我们可以看到新数组有4行和1列。

z = np.array([[1, 2], [ 3, 4]])
print(z.shape)    # (2, 2)

答案 2 :(得分:1)

这个错误也可能是由于API的错误使用造成的

正确

import React, { useState } from 'react';
import Dialog from '@material-ui/core/Dialog';
import DialogActions from '@material-ui/core/DialogActions';
import DialogContent from '@material-ui/core/DialogContent';
import DialogTitle from '@material-ui/core/DialogTitle';
import Button from '@material-ui/core/Button';

const AddEvent = (props) => {
    const [open, setOpen] = useState(false);
    const [event, setEvent] = useState({
        id: '',
        title: '',
        subTitle: '',
        startDate: '',
        displayUntilDate: '',
        location: '',
        description: '',
        infoLink: ''
    });

    const handleClickOpen = () => {
        setOpen(true);
    };

    const handleClose = () => {
        setOpen(false);
    };

    const handleChange = (event) => {
        setEvent({ ...event, [event.target.name]: event.target.value });
    }

    const handleSave = () => {
        props.addEvent(event); {/*<-- Line 34 for Add Event*/}
        handleClose();
    }

    return (
        <div>
            <br />
            <button class="" variant="outlined" color="primary" onClick={handleClickOpen}
            >Add New Event
            </button>
            

            <Dialog open={open} onClose={handleClose}>
                <DialogTitle>New Event</DialogTitle>
                <DialogContent>
                    <input type="text" placeholder="Id" name="id"
                        value={event.id} onChange={handleChange} /><br />
                    <input placeholder="Title" name="title"
                        value={event.title} onChange={handleChange} /><br />
                    <input type="text" placeholder="Sub Title" name="subTitle"
                        value={event.subTitle} onChange={handleChange} /><br />
                    <input type="date" placeholder="Start Date" name="startDate"
                        value={event.startDate} onChange={handleChange} /><br />
                    <input type="date" placeholder="Display Until Date" name="displayUntilDate"
                        value={event.displayUntilDate} onChange={handleChange} /><br />
                    <input type="text" placeholder="Location" name="location"
                        value={event.location} onChange={handleChange} /><br />
                    <input type="text" placeholder="Description" name="description"
                        value={event.description} onChange={handleChange} /><br />
                    <input type="text" placeholder="Info Link" name="infoLink"
                        value={event.infoLink} onChange={handleChange} /><br />
                </DialogContent>
                <DialogActions>
                    <button onClick={handleClose}>Cancel</button>
                    <button onClick={handleSave}>Save</button>
                </DialogActions>
            </Dialog>
        </div>
    );
};

export default AddEvent;
```

不正确

X_train, X_test, y_train, y_test = train_test_split(
    X, y, train_size=0.7, test_size=0.3, random_state=100
) 

答案 3 :(得分:0)

您是否检查过数据中是否有Nan?您可以使用np.isNan(X)np.isNan(y)。我看到你打开了选项drop,所以我怀疑你的数据中是否有Nan,这会改变输入的形状。

答案 4 :(得分:0)

可能是由于xy中的索引不同。当我们最初从数据帧中删除一些值并在分离xx之后对y执行某些操作时,可能会发生这种情况。 y中的索引将包含原始数据帧中缺少的索引,而x中的索引将具有连续的索引。最好在分离dataframe.reset_index(drop = True)x之前先进行y

答案 5 :(得分:0)

y_train.values.ravel()。 实际上 y_train 的形状是二维数组。 所以你需要把它转换成一维数组。 希望它对你有用。