Question

我正在使用调查数据来计算一些统计数据。据我所知，没有Python库可以处理这些数据集。因此，我尝试创建自己的函数。我在用熊猫。有一个名为“ fexp”的变量，该变量等于被选择的可能性的倒数。我的目标变量名为“ target”，并且有一个时间变量名为“ quarter”。在每个季度计算“目标”的总数和平均值时，我很有趣。我尝试了以下代码：

首先，我创建用于计算总和均值统计的函数。

def sum_w(x, w):
    return (x*w).sum()

def mean_w(x, w):
    w_copy = w.copy()
    w_copy[x.isna()] = np.nan

    ### I have to replace some fexp values by nan because it is theoretically 
    ### correct. Other software do this. Note that this procedure is not needed 
    ### for sum_w function. 

    return (x*w).sum()/w_copy.sum()

然后，我计算统计数据：

answer1 = dataset.groupby(['quarter'])['target'].apply(sum_w, dataset['fexp'])
answer2 = dataset.groupby(['quarter'])['target'].apply(mean_w, dataset['fexp'])

这适用于总计（sum_w）。我得到了预期的答案。但是，这在均值（mean_w）的情况下不起作用。我收到下一个错误：

IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match

注意：这是数据集的开头：

    quarter target  fexp
0   200712  NaN 139.297853
1   200712  46.0    139.297853
2   200712  NaN 139.297853
3   200712  55.0    139.297853
4   200712  NaN 139.297853

这是信息：

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2717600 entries, 0 to 2717599
Data columns (total 3 columns):
quarter    int64
target     float64
fexp       float64
dtypes: float64(2), int64(1)
memory usage: 62.2 MB

大熊猫中的Apply方法不起作用：作为索引器提供了不可对齐的布尔系列

0 个答案: