有条件地从pandas数据框中选择值

时间:2018-10-18 01:28:33

标签: python pandas dataframe pandas-groupby

我有一个数据框,其中我想确定每个人看到多少只独特的鸟类参加了我的“大年”。

我尝试使用列表理解和for循环遍历每一行,并使用.is_unique()确定它是否唯一,但这似乎是造成我很多困扰的原因。我可以通过.unique()很好地获得所有独特物种的列表,但我想以某种方式让与这些鸟类相关的人。

x2

编辑:我想我还不清楚-我想获得一张清单,每个人都没有其他人看到过。因此,输出将以任何格式显示,例如(Steve,0),(Ben,1),(Greg,1)。

谢谢!

4 个答案:

答案 0 :(得分:1)

这可以很容易地通过列表理解来完成。

df = pd.DataFrame({'Species':['woodpecker', 'woodpecker', 'dove', 'mockingbird'], 'Birder':['Steve', 'Ben','Ben','Greg']})

matches = [(row[1], row[2]) for row in df.itertuples() if (row[1],row[2]) not in matches]

这给出了元组列表作为输出:

[('Steve', 'woodpecker'), ('Ben', 'woodpecker'), ('Ben', 'dove'), ('Greg', 'mockingbird')]

答案 1 :(得分:1)

他们看到的独特鸟类的名字

 ben_unique_bird = df[df['Birder'] == 'Ben']['Species'].unique()

他们看到的独特鸟类数量

len(df[df['Birder'] == 'Ben']['Species'].unique())

推荐的方法1获取表格

df.groupby(['Birder']).agg({"Species": lambda x: x.nunique()})

相同的方法已分解

for i in df['Birder'].unique():
    print (" Name ",i," Distinct count ",len(df[df['Birder'] == i]['Species'].unique())," distinct bird names ",df[df['Birder'] == i]['Species'].unique())

答案 2 :(得分:0)

我想出了一种做我想要的事情的糟糕方法,但是它行得通。如果您有更有效的方法,请告诉我,因为我知道必须有一种方法。

data = pd.DataFrame({'Species':['woodpecker', 'woodpecker', 'dove', 'mockingbird'], 'Birder':['Steve', 'Ben','Ben','Greg']})

ben_birds = []
steve_birds = []
greg_birds = []

#get all the names of the birds that people saw and put them in a list
for index, row in data.iterrows():
    if row['Birder'] == 'Bright':
        ben_birds.append(row['Species'])
    elif row['Birder'] == 'Filios':
        steve_birds.append(row['Species'])
    else:
        greg_birds.append(row['Species'])

duplicates=[]
#compare each of the lists to look for duplicates, and make a new list with those
for bird in ben_birds:
    if (bird in steve_birds) or (bird in greg_birds):
        duplicates.append(bird)

for bird in steve_birds:
    if (bird in greg_birds):
        duplicates.append(bird)

#if any of the duplicates are in a list, remove those birds
for bird in ben_birds:
    if bird in duplicates:
        ben_birds.remove(bird)

for bird in steve_birds:
    if bird in duplicates:
        steve_birds.remove(bird)

for bird in greg_birds:
    if bird in duplicates:
        greg_birds.remove(bird)

print(f'Ben saw {len(ben_birds)} Birds that no one else saw')
print(f'Steve saw {len(steve_birds)} Birds that no one else saw')
print(f'Greg saw {len(greg_birds)} Birds that no one else saw')

答案 3 :(得分:0)

您可以通过pd.DataFrame.duplicated创建一个帮助器系列,然后使用GroupBy + sum

counts = data.assign(dup_flag=df['Species'].duplicated(keep=False))\
             .groupby('Birder')['dup_flag'].sum().astype(int)

for name, count in counts.items():
    print(f'{name} saw {count} bird(s) that no one else saw')

结果:

Ben saw 1 bird(s) that no one else saw
Greg saw 0 bird(s) that no one else saw
Steve saw 1 bird(s) that no one else saw