如何遍历数据框中的列以搜索特定值?

时间:2019-05-13 21:42:12

标签: python pandas dataframe

我有一个宠物名字和主人的数据集:

Pets    Owners
dog     James
dog     Katelyn
rat     Shelly
cat     Bob

我希望能够在“所有者”列中进行搜索以找到名称Katelyn,然后为给定所有者打印出矢量名称。到目前为止,我有这个:

def pet_name():
    owner = input("What is the Owner name? ")

    # check to see if owner exist in pets dataset
    # if ownderID exist then print corresponding pet names
    if owner in pets['Owners']: 
        print( pets[['Pets','Owners']][pets.Owners == owner])

    # if ownerID doesnt' exist
    elif not age: 
        print("Sorry, this Owner doesn't exist. Try again! ")


    # if no ownerID has been entered at all 
    else: 
        print("You didn't enter any Owner. Try again! ")

当我输入要搜索的名称时,它会自动转到代码的else部分。我怎样才能解决这个问题?我应该使用itterows()吗?

2 个答案:

答案 0 :(得分:0)

在检查owner in pets['Owners']是否在字典上下文中使用pets时,它会检查owner是否在pets的索引中。而是检查是否owner in pets['Owners'].values

也就是说,我宁愿看到pet_name这样写:

def pet_name():
    owner = input("What is the Owner name? ")

    # check to see if owner exist in pets dataset
    # if ownderID exist then print corresponding pet names
    mask = pets['Owners'] == owner
    if mask.any():
        print(pets.loc[mask, ['Pets', 'Owners']])

    # if ownerID doesnt' exist
    elif not age: 
        print("Sorry, this Owner doesn't exist. Try again! ")


    # if no ownerID has been entered at all 
    else: 
        print("You didn't enter any Owner. Try again! ")

答案 1 :(得分:0)

首先,让我们看看问题出在哪里,然后我们找到解决问题的方法。

In [1]: import pandas as pd

In [2]: pets = pd.read_csv('pets.csv')

In [3]: pets
Out[3]:
  Pets   Owners
0  dog    James
1  dog  Katelyn
2  rat   Shelly
3  cat      Bob

In [4]: type(pets["Owners"])
Out[4]: pandas.core.series.Series

我们可以看到petspandas.Series对象。现在问题显然出在以下代码行中:

if owner in pets['Owners']:

这就是you can't use in operator with pandas.Series的原因,但基本上是因为Pandas的开发人员并未以可能使用“ Membership test operations”的方式实现此模块。因此,正如您自己提到的那样,它将始终返回False

In [5]: owner in pets["Owners"]
Out[5]: False

现在,如果您想使用pets["Owners"],可以这样做(如@piRSquared的建议):

In [6]: owner in pets["Owners"].values
Out[6]: True

但是,如果我们查看pandas.Series.values的文档:

  

警告:我们建议使用Series.arraySeries.to_numpy(),   取决于您是否需要参考基础数据还是   NumPy数组。

所以我们可以这样做:

In [7]: owner in pets["Owners"].array
Out[7]: True

还有一种更好的方法,您是否想找出“给定主人的宠物”,对吗?如果是这样,您可以这样做:

In [8]: pet = pets.loc[pets["Owners"] == owner, "Pets"]

In [8]: if pet.any():
   ...:     print(pet)
   ...: else:
   ...:     print("You didn't enter any Owner. Try again! ")
Out[8]:
1    dog
Name: Pets, dtype: object

如您所见,这将打印一个pandas.Series对象。您有mentioned格式的“向量/列表/数组”。尚不清楚,但我认为情况是owner可以有多个pets,并且您想检查owner是否有任何pets,然后打印<列表类型的pets中的所有em> 。如果是这样,您可以使用pet.array。例如,如果我们修改您的数据集,以使 Katelyn 拥有不止一只宠物:

Pets    Owners
dog     James
dog     Katelyn
rat     Katelyn <-----
rat     Shelly
cat     Bob

然后我们可以看到它为我们提供了一个列表:

In [9]: if pet.any():
    ...:     print(pet.array)
    ...: else:
    ...:     print("You didn't enter any Owner. Try again! ")
Out[9]: 
<PandasArray>
['dog', 'rat']
Length: 2, dtype: object