尝试在数据框中创建新列时遇到值错误。看起来像这样:
state veteran_pop pct_gulf pct_vietnam
0 Alaska 70458 20.0 31.2
1 Arizona 532634 8.8 15.8
2 Colorado 395350 10.1 20.8
3 Georgia 693809 10.8 21.8
4 Iowa 234659 7.1 13.7
所以我有一个看起来像这样的函数:
def addProportions(table, col1, col2, new_col):
for row, index in table.iterrows():
table[new_col] = ((table[col1] + table[col2])/100)
return(table)
table
是上面的表格,col1 = "pct_gulf"
,col2 = "pct_vietnam"
和new_col = "pct_total"
就像这样:
addProportions(table, "pct_gulf", "pct_vietnam", "total_pct")
但是当我运行此功能时,我收到以下错误消息:
ValueError: Wrong number of items passed 2, placement implies 1
---或者---
我已经将addProportions
函数设置如下:
def addProportions(table, col1, col2, new_col):
table[new_col] = 0
for row, index in table.iterrows():
table[new_col] = ((table[col1] + table[col2])/100)
return(table)
我得到了这个输出,这似乎是朝着正确方向迈出的一步。
state veteran_pop pct_gulf pct_vietnam total_pct
0 Alaska 70458 20.0 31.2 NaN
1 Arizona 532634 8.8 15.8 NaN
2 Colorado 395350 10.1 20.8 NaN
3 Georgia 693809 10.8 21.8 NaN
4 Iowa 234659 7.1 13.7 NaN
但是问题是,当我在两列上使用type()
时,我尝试将其添加为数据帧,这就是为什么我认为我得到NaN。
----表格信息----
t.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 55 entries, 0 to 54
Data columns (total 4 columns):
(state,) 55 non-null object
(veteran_pop,) 55 non-null int64
(pct_gulf,) 55 non-null float64
(pct_vietnam,) 55 non-null float64
dtypes: float64(2), int64(1), object(1)
memory usage: 1.8+ KB
t.index
RangeIndex(start=0, stop=55, step=1)
t.columns
MultiIndex(levels=[[u'pct_gulf', u'pct_vietnam', u'state', u'veteran_pop']],
codes=[[2, 3, 0, 1]])
答案 0 :(得分:0)
您不需要循环。您只需要(表是数据框的名称):
table.columns=table.columns.droplevel()
table['total_pct']=(table['pct_gulf']+table['pct_vietnam'])/100
print(table)
答案 1 :(得分:0)
我认为问题在于您有一个MultiIndex。
当我根据您的信息构建一个DataFrame时,它的外观如下:
table = pd.DataFrame(data={"state":["Alaska", "Arizona", "Colorado",
"Georgia", "Iowa"],
"veteran_pop":[70458, 532634, 395350, 693809, 234659],
"pct_gulf": [20.0, 8.8, 10.1, 10.8, 7.1],
"pct_vietnam": [31.2, 15.8, 20.8, 21.8, 13.7]})
然后table.info()显示以下内容:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 5 columns):
state 5 non-null object
veteran_pop 5 non-null int64
pct_gulf 5 non-null float64
pct_vietnam 5 non-null float64
total_pct 5 non-null float64
dtypes: float64(3), int64(1), object(1)
memory usage: 280.0+ bytes
如果我构造一个MultiIndex,则会出现错误:
multi = pd.DataFrame(data={("state",):["Alaska", "Arizona", "Colorado", "Georgia", "Iowa"],
("veteran_pop",):[70458, 532634, 395350, 693809, 234659],
("pct_gulf",): [20.0, 8.8, 10.1, 10.8, 7.1],
("pct_vietnam",): [31.2, 15.8, 20.8, 21.8, 13.7]})
如果我在常规DataFrame上运行addProportions(table),我将得到正确的答案:
state veteran_pop pct_gulf pct_vietnam total_pct
0 Alaska 70458 20.0 31.2 0.512
1 Arizona 532634 8.8 15.8 0.246
2 Colorado 395350 10.1 20.8 0.309
3 Georgia 693809 10.8 21.8 0.326
4 Iowa 234659 7.1 13.7 0.208
但是在MultiIndex上运行它会引发错误。
TypeError: addProportions() missing 3 required positional arguments:
'col1', 'col2', and 'new_col'
以某种方式,即使您在此处没有层次结构类别,您最终还是在列中使用了MultiIndex。 ((仅当您按年份细分百分比时,才需要它:
columns = pd.MultiIndex.from_product([["percentage","veteran_pop"], ["army","navy"], ["2010", "2015"]])
pd.DataFrame( columns=columns, index=pd.RangeIndex(start=0, stop=5))
percentage veteran_pop
army navy army navy
2010 2015 2010 2015 2010 2015 2010 2015
0 NaN NaN NaN NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN NaN NaN NaN
...
您需要重塑DataFrame才能使用您编写的功能。该函数有效,但是列中的索引类型错误。
答案 2 :(得分:0)
如果要将数据保留为多索引,请将函数更改为:
def addProportions(table, col1, col2, new_col):
table[new_col] = ((table[(col1,)] + table[(col2,)])/100)
# you can enable the return line if it is in need
# return table
如果要将数据重塑为普通数据:
def addProportions(table, col1, col2, new_col):
table[new_col] = ((table[col1] + table[col2])/100)
# you can enable the return line if it is in need
# return table
# shape a new df without the multi-index
new_col = [i[0] for i in multi.columns]
new_df = pd.DataFrame(multi.values, columns = new_col)
# call funtion
addProportions(new_df, "pct_gulf", "pct_vietnam", "total_pct")