Question

为什么第一个DF返回0的键错误？

import pandas as pd

t1 = pd.DataFrame({'c1':[1,2,2,1]})  
t1.c1.value_counts()[0]  # key error: 0

t2 = pd.DataFrame({'c1':['a','b','b','a']})  
t2.c1.value_counts()[0]  # prints 2

感谢@ anky_91到this post的链接，这是答案：

#t1's output
index  value
2      2
1      2

#t2's output
index  value
b      2
a      2

执行t1[0]时，它将查找标记为0的索引，该索引等于t1.loc[0]，这将返回key error，因为其中没有0 t1的索引。

执行t2[0]时，它还会查找标记为0的索引。由于t2的索引中没有这样的值，因此它也应返回key error。但是，Pandas在这里t2's index is all character based, but an integer is passed in. Thus, I would assume you are looking for the value stored at 'position 0' rather than at名为0`的标签上试图变得聪明。

基本上，熊猫翻译了t2[0] -> t2.loc[0], whoops t2's index contains characters only so try position based -> t2.iloc[0]

Answer 1

在您的代码中：

test.c1.value_counts()

会给你一系列。您可以使用：

type(test.c1.value_counts())

要看到的类型。如果您在一系列索引中具有int值，则在尝试Series[int_value]时，它将首先查找该int_value是否在索引中，然后会出现错误。您可以尝试：

test = pd.DataFrame({'c1':['a',1.2,1.2]})
test.c1.value_counts()[0]

此代码不会出错。只是因为test.c1.value_counts()的索引中没有int。

Answer 2

如果您算出答案，而不[0]，你会看到发生了什么。

首先，

test = pd.DataFrame({'c1':[1,2,2,1]})  
test.c1.value_counts()

2    2
1    2

和在第二个实例：

test = pd.DataFrame({'c1':['a','b','b','a']})  
test.c1.value_counts() 

a    2
b    2

因此，在第一个实例中使用[0]时，没有索引'0'，因此会出现错误。如果在第一个实例中使用[2]或[1]，将得到2。

在第二个实例中，由于有字母在索引中，你将得到[0]和[1]返回2。此外，可以使用[“A”]和[“B”]，并且还返回2每个在第二个实例。

我不确定您是否要在索引中查找项目总数的值。如果要查找项目总数，则应考虑将[0]替换为.count（）

import pandas as pd

test = pd.DataFrame({'c1':[1,2,2,1]})  
test.c1.value_counts().count()  # prints 2

test = pd.DataFrame({'c1':['a','b','b','a']})  
test.c1.value_counts().count()  # prints 2

熊猫的value_counts返回错误，具体取决于系列的值类型

2 个答案: