我有一个按州名和县名索引的人口普查数据集,想要遍历每一行,找到标记为'每年人口估计的所有列的最大值和最小值,然后减去这些两个值。我希望函数返回带有索引和值的Pandas系列。
这是我目前的代码:
columns_to_keep=[
'STNAME',
'CTYNAME',
'POPESTIMATE2010',
'POPESTIMATE2011',
'POPESTIMATE2012',
'POPESTIMATE2013',
'POPESTIMATE2014',
'POPESTIMATE2015'
]
df=census_df[columns_to_keep]
def answer_seven(lst):
lst=[df['POPESTIMATE2010'],df['POPESTIMATE2011'],df['POPESTIMATE2012'],
df['POPESTIMATE2013'],df['POPESTIMATE2014'],df['POPESTIMATE2015']]
return max(lst)-min(lst)
answer_seven(lst)
错误消息:
ValueError Traceback (most recent call last)
<ipython-input-110-845350b0b5f7> in <module>()
18 return max(lst)-min(lst)
19
---> 20 answer_seven(lst)
21
<ipython-input-110-845350b0b5f7> in answer_seven(lst)
16 df['POPESTIMATE2013'],df['POPESTIMATE2014'],df['POPESTIMATE2015']]
17
---> 18 return max(lst)-min(lst)
19
20 answer_seven(lst)
/opt/conda/lib/python3.5/site-packages/pandas/core/generic.py in __nonzero__(self)
890 raise ValueError("The truth value of a {0} is ambiguous. "
891 "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
--> 892 .format(self.__class__.__name__))
893
894 __bool__ = __nonzero__
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
答案 0 :(得分:3)
熊猫可以直接这样做:
cols_of_interest = ['POPESTIMATE2010', 'POPESTIMATE2011', 'POPESTIMATE2012', 'POPESTIMATE2013', 'POPESTIMATE2014' , 'POPESTIMATE2015']
df[cols_of_interest].max(axis=1) - df[cols_of_interest].min(axis=1)
这将是一个由数据框的原始索引索引的系列,每行的最大值减去最小值
答案 1 :(得分:1)
答案 2 :(得分:0)
我遇到了需要保留的NaN值的麻烦,并使用了以下内容:
x = {}
for col in df_count:
x[col] = df_count[col].max()- df_count[col].min()
pd.Series(x)