Question

我有一个如下所示的数据集：

country | year      | supporting_nation | eco_sup  | mil_sup
------------------------------------------------------------------
  Fake       1984        US                 1          1
  Fake       1984        SU                 0          1

在这个虚假的例子中，一个国家在冷战期间扮演双方并得到两者的支持。

我正在以两种方式重塑数据集：

我删除了所有非US / SU支持实例，我只对这两个国家感兴趣
我想将其缩减为1 line per year per country，这意味着我要为每个变量添加US / SU特定的虚拟变量

像这样：

country |   year      | US_SUP | US_eco_sup  | US_mil_sup | SU_SUP | SU_eco_sup  | SU_mil_sup |
    ------------------------------------------------------------------------------------------
 Fake       1984        1             1          1         1          1             1
 Fake       1985        1             1          1         1          1             1
 florp      1984        0             0          0         1          1             1
 florp      1985        0             0          0         1          1             1

我添加了所有虚拟对象，US_SUP和SU_SUP列已填充正确的值。

但是，我在为其他变量提供正确的值方面遇到了麻烦。

为此，我编写了以下函数：

def get_values(x):
    cols = ['eco_sup', 'mil_sup']
    nation = ''
    if x['SU_SUP'] == 1:
        nation = 'SU_'
    if x['US_SUP'] == 1:
        nation = 'US_'

    support_vars = x[['eco_sup', 'mil_sup']]
    # Since each line contains only one measure of support I can
    # automatically assume that the support_vars are from
    # the correct nation
    support_cols = [nation + x for x in cols]
    x[support_cols] = support_vars

计划不仅仅是使用df.groupby.agg('max')操作，但我从未进入此步骤，因为上面的函数会为每个新的虚拟列返回0，而不管数据框中列的值如何。

所以在最后一个表中，所有US/SU_mil/eco_sup变量都是0.

有谁知道我做错了什么/为什么列得到了错误的值？

Answer 1

我通过放弃.apply函数并使用它来解决我的问题（其中old是旧变量名称的列表）

for index, row in df.iterrows():
    if row['SU_SUP'] == 1:
        nation = 'SU_'
        for col in old:
            df[index: index + 1][nation + col] = int(row[col])
    if row['US_SUP'] == 1:
        nation = 'US_'
        for col in old:
            df[index: index + 1][nation + col] = int(row[col])

这就是诀窍！

Pandas：函数中的列值赋值不起作用

1 个答案: