Question

我正在尝试创建一个减少编码量的函数。该函数在Pandas数据帧上使用exec（）来进行一些计算。但是当我把代码放在exec（）中时，它会返回NaNs。这是功能：

def FeatureMaker(data, criteria_col, f_data):
    source = data.copy()
    for key, value in f_data.items():
        for i in source[criteria_col].unique():
            source.set_value(source[source[criteria_col] == i].index.tolist(), key, exec('source[source[criteria_col] == i]["' + value[0] + '"].' + value[1] + '()'))
    return source

该函数应该获取一个数据框，找到已作为保持条件的列传递的列的唯一项（在本例中为“id”列），并计算传递给它的内容。在此函数中，参数 data 包含数据框， criteria_col 包含具有条件值的列的名称， f_data 是一个字典，其键是新列的名称，其值是必须执行计算的列和计算本身。这是我运行该函数的示例：

FeatureMaker(spend_alpha, 'id', {'total_sum': ["spend", "sum"]})

在此示例中， spend_alpha 是我的数据框的名称， id 是我的条件列，我想创建 total_sum 列计算花费列的总和。

该功能运行完美。然而，它只返回NaNs。我尝试了下面的代码：

for i in spend_alpha['id'].unique():
    spend_alpha.set_value(spend_alpha[spend_alpha['id'] == i].index.tolist(), 'total_sum', spend_alpha[spend_alpha['id'] == i]['spend'].sum())

它没有问题。我得到每个id的总和。但是，此代码也会返回NaN：

for i in spend_alpha['id'].unique():
    spend_alpha.set_value(spend_alpha[spend_alpha['id'] == i].index.tolist(), 'total_sum', exec("spend_alpha[spend_alpha['id'] == i]['spend'].sum()"))

我的问题是，如何在不获取NaN的情况下在Pandas数据帧上使用exec（）？提前谢谢。

修改

可重复的示例：

假设我创建了 spend_alpha 数据框，如下所示......

In [1]: import pandas as pd

In [2]: spend_alpha = pd.DataFrame([[100, 250],
   ...:                             [101, 50],
   ...:                             [102, 60],
   ...:                             [100, 50],
   ...:                             [102, 30],
   ...:                             [101, 50]], columns=['id', 'spend'])

In [3]: spend_alpha
Out[3]:
    id  spend
0  100    250
1  101     50
2  102     60
3  100     50
4  102     30
5  101     50

我可以通过运行以下代码添加一个名为 total_sum 的新列...

In [4]: for i in spend_alpha['id'].unique():
   ...:     spend_alpha.set_value(spend_alpha[spend_alpha['id'] == i].index.tolist(), 'total_sum', spend_alpha[spend_alpha['id'] == i]['spend'].sum())
   ...:

In [5]: spend_alpha
Out[5]:
    id  spend  total_sum
0  100    250      300.0
1  101     50      100.0
2  102     60       90.0
3  100     50      300.0
4  102     30       90.0
5  101     50      100.0

但是，如果我将set_value（）函数的第三个参数放在exec（）中，它就会返回NaNs。如下图所示：

In [6]: for i in spend_alpha['id'].unique():
   ...:     spend_alpha.set_value(spend_alpha[spend_alpha['id'] == i].index.tolist(), 'total_sum', exec("spend_alpha[spend_alpha['id'] == i]['spend'].sum()"))
   ...:

In [7]: spend_alpha
Out[7]:
    id  spend  total_sum
0  100    250        NaN
1  101     50        NaN
2  102     60        NaN
3  100     50        NaN
4  102     30        NaN
5  101     50        NaN

我希望能够将第三个参数放在exec（）中，以便我可以在函数中使用该行。这让我通过将其传递给函数来计算除sum（）之外的其他聚合。但是由于exec（）返回NaNs，所以不可能。

同样，该函数必须计算每列列的花费列的总和来自 spend_alpha 数据框的ID 并将其保存在名为 total_sum 的列中。但它会返回NaNs。

In [8]: def FeatureMaker(data, criteria_col, f_data):
   ...:     source = data.copy()
   ...:     for key, value in f_data.items():
   ...:         for i in source[criteria_col].unique():
   ...:             source.set_value(source[source[criteria_col] == i].index.tolist(), key, exec('source[source[criteria_col] == i]["' + value[0] + '"].' + value[1] + '()'))
   ...:     return source

In [9]: FeatureMaker(spend_alpha, 'id', {'total_sum': ["spend", "sum"]})
Out[9]:
    id  spend  total_sum
0  100    250        NaN
1  101     50        NaN
2  102     60        NaN
3  100     50        NaN
4  102     30        NaN
5  101     50        NaN

如果有人能帮助我解决这个问题，我会非常感激。

pandas中的Exec（）返回NaNs

0 个答案: