Question

我有一个数据集，其中的列表示YEAR，并且可以随时间动态变化。数据集看起来像-

Unnamed: 0  2000    2001    2002    2003    2004    2005    2006    2007    2008    2009    2010
0   North America   109.24  119.60946   144.29389   187.86691   227.29032   265.21215   340.15054   472.83005   666.47907   768.71809   914.4242
1   Bermuda 0   0   0   0   0   0   0   0   0   0   0
2   Canada  3.7 3.9 4   4   4   4.6 5.2 15.4    16.7    22.1    26.4
3   Greenland   0   0   0   0   0   0   0   0   0   0   0
4   Mexico  0   0   0   0   0   0   0   0.1 0.1 0.103   0.4

我想遍历所有元素以查看是否有大于50的单元格并打印相应的国家/地区名称。

Answer 1

首先根据set_index中的read_csv或index_col参数通过第一列创建索引：

df = df.set_index('Unnamed: 0')
#alternative if possible
#df = pd.read_csv(file, index_col=0)

print (df)
                 2000       2001       2002       2003       2004       2005  \
Unnamed: 0                                                                     
North America  109.24  119.60946  144.29389  187.86691  227.29032  265.21215   
Bermuda          0.00    0.00000    0.00000    0.00000    0.00000    0.00000   
Canada           3.70    3.90000    4.00000    4.00000    4.00000    4.60000   
Greenland        0.00    0.00000    0.00000    0.00000    0.00000    0.00000   
Mexico           0.00    0.00000    0.00000    0.00000    0.00000    0.00000   

                    2006       2007       2008       2009      2010  
Unnamed: 0                                                           
North America  340.15054  472.83005  666.47907  768.71809  914.4242  
Bermuda          0.00000    0.00000    0.00000    0.00000    0.0000  
Canada           5.20000   15.40000   16.70000   22.10000   26.4000  
Greenland        0.00000    0.00000    0.00000    0.00000    0.0000  
Mexico           0.00000    0.10000    0.10000    0.10300    0.4000

out = df.index[df.gt(50).any(axis=1)].tolist()
print (out)
['North America']

说明：

通过DataFrame.gt，（>）比较所有数据：

print (df.gt(50))
                2000   2001   2002   2003   2004   2005   2006   2007   2008  \
Unnamed: 0                                                                     
North America   True   True   True   True   True   True   True   True   True   
Bermuda        False  False  False  False  False  False  False  False  False   
Canada         False  False  False  False  False  False  False  False  False   
Greenland      False  False  False  False  False  False  False  False  False   
Mexico         False  False  False  False  False  False  False  False  False   

                2009   2010  
Unnamed: 0                   
North America   True   True  
Bermuda        False  False  
Canada         False  False  
Greenland      False  False  
Mexico         False  False

然后检查每行至少一个值是DataFrame.any的True：

print (df.gt(50).any(axis=1))
Unnamed: 0
North America     True
Bermuda          False
Canada           False
Greenland        False
Mexico           False
dtype: bool

通过boolean indexing和df.index进行最后过滤：

print (df.index[df.gt(50).any(axis=1)])

Index(['North America'], dtype='object', name='Unnamed: 0')

读取Python Dataframe中的所有元素

1 个答案: