Question

尝试在数据框中创建新列时遇到值错误。看起来像这样：

      state  veteran_pop  pct_gulf  pct_vietnam
0    Alaska        70458      20.0         31.2
1   Arizona       532634       8.8         15.8
2  Colorado       395350      10.1         20.8
3   Georgia       693809      10.8         21.8
4      Iowa       234659       7.1         13.7

所以我有一个看起来像这样的函数：

def addProportions(table, col1, col2, new_col):

    for row, index in table.iterrows():
        table[new_col] = ((table[col1] + table[col2])/100)
    return(table)

table是上面的表格，col1 = "pct_gulf"，col2 = "pct_vietnam"和new_col = "pct_total"就像这样：

addProportions(table, "pct_gulf", "pct_vietnam", "total_pct")

但是当我运行此功能时，我收到以下错误消息：

ValueError: Wrong number of items passed 2, placement implies 1

---或者---

我已经将addProportions函数设置如下：

def addProportions(table, col1, col2, new_col):
    table[new_col] = 0
    for row, index in table.iterrows():
        table[new_col] = ((table[col1] + table[col2])/100)
    return(table)

我得到了这个输出，这似乎是朝着正确方向迈出的一步。

      state veteran_pop pct_gulf pct_vietnam total_pct
0    Alaska       70458     20.0        31.2       NaN
1   Arizona      532634      8.8        15.8       NaN
2  Colorado      395350     10.1        20.8       NaN
3   Georgia      693809     10.8        21.8       NaN
4      Iowa      234659      7.1        13.7       NaN

但是问题是，当我在两列上使用type()时，我尝试将其添加为数据帧，这就是为什么我认为我得到NaN。

----表格信息----

t.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 55 entries, 0 to 54
Data columns (total 4 columns):
(state,)          55 non-null object
(veteran_pop,)    55 non-null int64
(pct_gulf,)       55 non-null float64
(pct_vietnam,)    55 non-null float64
dtypes: float64(2), int64(1), object(1)
memory usage: 1.8+ KB

t.index

RangeIndex(start=0, stop=55, step=1)

t.columns

MultiIndex(levels=[[u'pct_gulf', u'pct_vietnam', u'state', u'veteran_pop']], codes=[[2, 3, 0, 1]])

Answer 1

您不需要循环。您只需要（表是数据框的名称）：

table.columns=table.columns.droplevel()
table['total_pct']=(table['pct_gulf']+table['pct_vietnam'])/100
print(table)

Answer 2

我认为问题在于您有一个MultiIndex。

当我根据您的信息构建一个DataFrame时，它的外观如下：

    table = pd.DataFrame(data={"state":["Alaska", "Arizona", "Colorado", 
    "Georgia", "Iowa"], 

    "veteran_pop":[70458, 532634, 395350, 693809, 234659],

    "pct_gulf": [20.0, 8.8, 10.1, 10.8, 7.1],

    "pct_vietnam": [31.2, 15.8, 20.8, 21.8, 13.7]})

然后table.info（）显示以下内容：

    <class 'pandas.core.frame.DataFrame'>
    RangeIndex: 5 entries, 0 to 4
    Data columns (total 5 columns):
    state          5 non-null object
    veteran_pop    5 non-null int64
    pct_gulf       5 non-null float64
    pct_vietnam    5 non-null float64
    total_pct      5 non-null float64
    dtypes: float64(3), int64(1), object(1)
    memory usage: 280.0+ bytes

如果我构造一个MultiIndex，则会出现错误：

    multi = pd.DataFrame(data={("state",):["Alaska", "Arizona", "Colorado", "Georgia", "Iowa"], 

    ("veteran_pop",):[70458, 532634, 395350, 693809, 234659],

    ("pct_gulf",): [20.0, 8.8, 10.1, 10.8, 7.1],

    ("pct_vietnam",): [31.2, 15.8, 20.8, 21.8, 13.7]})

如果我在常规DataFrame上运行addProportions（table），我将得到正确的答案：

    state        veteran_pop    pct_gulf    pct_vietnam total_pct
     0  Alaska    70458         20.0        31.2        0.512
    1   Arizona   532634        8.8         15.8        0.246
    2   Colorado  395350        10.1        20.8        0.309
    3   Georgia   693809        10.8        21.8        0.326
    4   Iowa      234659        7.1         13.7        0.208

但是在MultiIndex上运行它会引发错误。

    TypeError: addProportions() missing 3 required positional arguments: 
    'col1', 'col2', and 'new_col'

以某种方式，即使您在此处没有层次结构类别，您最终还是在列中使用了MultiIndex。（（仅当您按年份细分百分比时，才需要它：

    columns = pd.MultiIndex.from_product([["percentage","veteran_pop"], ["army","navy"], ["2010", "2015"]])
    pd.DataFrame( columns=columns, index=pd.RangeIndex(start=0, stop=5))

    percentage  veteran_pop
    army    navy    army    navy
    2010    2015    2010    2015    2010    2015    2010    2015
    0   NaN NaN NaN NaN NaN NaN NaN NaN
    1   NaN NaN NaN NaN NaN NaN NaN NaN
    ...

您需要重塑DataFrame才能使用您编写的功能。该函数有效，但是列中的索引类型错误。

Answer 3

如果要将数据保留为多索引，请将函数更改为：

def addProportions(table, col1, col2, new_col):

    table[new_col] = ((table[(col1,)] + table[(col2,)])/100)
    # you can enable the return line if it is in need
    # return table

如果要将数据重塑为普通数据：

def addProportions(table, col1, col2, new_col):

    table[new_col] = ((table[col1] + table[col2])/100)
    # you can enable the return line if it is in need
    # return table

# shape a new df without the multi-index 
new_col = [i[0] for i in multi.columns]
new_df = pd.DataFrame(multi.values, columns = new_col)

# call funtion
addProportions(new_df, "pct_gulf", "pct_vietnam", "total_pct")

尝试使用函数创建新的数据框列时出现值错误

3 个答案: