我正在尝试创建一个用户放入年份的功能,并且使用此Lynda class作为模型,输出是支出的前十个国家/地区。
这是数据框
df.dtypes
Country Name object
Country Code object
Year int32
CountryYear object
Population int32
GDP float64
MilExpend float64
Percent float64
dtype: object
Country Name Country Code Year CountryYear Pop GDP Expend Percent
0 Aruba ABW 1960 ABW-1960 54208 0.0 0.0 0.0
我尝试过这段代码并遇到错误:
代码:
def topten(Year):
simple = df_details_merged.loc[Year].sort('MilExpend',ascending=False).reset_index()
simple = simple.drop(['Country Code', 'CountryYear'],axis=1).head(10)
simple.index = simple.index + 1
return simple
topten(1990)
这是我收到的相当大的错误: 我可以得到一些帮助吗?我甚至无法弄清楚错误是什么。 : - (
C:\Users\mycomputer\Anaconda3\lib\site-packages\ipykernel\__main__.py:2: FutureWarning: sort is deprecated, use sort_values(inplace=True) for INPLACE sorting
from ipykernel import kernelapp as app
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
C:\Users\mycomputer\Anaconda3\lib\site-packages\pandas\core\series.py in _try_kind_sort(arr)
1738 # if kind==mergesort, it can fail for object dtype
-> 1739 return arr.argsort(kind=kind)
1740 except TypeError:
TypeError: '<' not supported between instances of 'numpy.ndarray' and 'str'
During handling of the above exception, another exception occurred:
TypeError Traceback (most recent call last)
<ipython-input-105-0c974c6a1b44> in <module>()
----> 1 topten(1990)
<ipython-input-104-b8c336014d5b> in topten(Year)
1 def topten(Year):
----> 2 simple = df_details_merged.loc[Year].sort('MilExpend',ascending=False).reset_index()
3 simple = simple.drop(['Country Code', 'CountryYear'],axis=1).head(10)
4 simple.index = simple.index + 1
5
C:\Users\mycomputer\Anaconda3\lib\site-packages\pandas\core\series.py in sort(self, axis, ascending, kind, na_position, inplace)
1831
1832 return self.sort_values(ascending=ascending, kind=kind,
-> 1833 na_position=na_position, inplace=inplace)
1834
1835 def order(self, na_last=None, ascending=True, kind='quicksort',
C:\Users\mycomputer\Anaconda3\lib\site-packages\pandas\core\series.py in sort_values(self, axis, ascending, inplace, kind, na_position)
1751 idx = _default_index(len(self))
1752
-> 1753 argsorted = _try_kind_sort(arr[good])
1754
1755 if not ascending:
C:\Users\mycomputer\Anaconda3\lib\site-packages\pandas\core\series.py in _try_kind_sort(arr)
1741 # stable sort not available for object dtype
1742 # uses the argsort default quicksort
-> 1743 return arr.argsort(kind='quicksort')
1744
1745 arr = self._values
TypeError: '<' not supported between instances of 'numpy.ndarray' and 'str'
答案 0 :(得分:1)
.loc
的第一个参数是行标签。
当您致电df_details_merged.loc[1960]
时,pandas会找到标有1960
的行,并将该行作为系列返回。所以你得到一个索引为Country Name, Country Code, ...
的系列,其值是该行的值。然后,您的代码会尝试按MilExpend
对其进行排序,这就是失败的原因。
您需要的不是loc
,而是一个简单的条件:df[df.Year == Year]
。这是“给我整个数据框,但只有'年'列包含我在”年“变量中指定的内容(在您的示例中为1960)。
sort
目前仍有效,但已被弃用,因此请改用sort_values
。将它们放在一起:
simple = df_details_merged[df_details_merged.Year == Year].sort_values(by='MilExpend', ascending=False).reset_index()
然后你可以继续删除列,然后像现在一样获取前10行。