Question

我有一个这样的数据框

import pandas as pd
data = {'Index Title'  : ["Company1", "Company1", "Company2", "Company3"],
    'BusinessType'     : ['Type 1', 'Type 2', 'Type 1', 'Type 2'],
    'ID1'     : ['123', '456', '789', '012'] 
        }
df = pd.DataFrame(data)
df.index = df["Index Title"]
del df["Index Title"]
print(df)

Dataframe

其中索引标题是公司名称。对于公司1，我有两种类型-类型1和类型2。

对于公司2，我只有类型1 对于公司3，我只有类型2。

我想删除只有一种类型-Type 1或Type 2的那些行。

因此，在这种情况下，应删除Company 2和Company3。

您能帮我什么最好的方法吗？

Answer 1

对于此类问题，我们通常考虑基于groupby和transform的过滤，因为它的速度非常快。

df[df.groupby(level=0)['BusinessType'].transform('nunique') > 1]

            BusinessType  ID1
Index Title                  
Company1          Type 1  123
Company1          Type 2  456

第一步是确定与一种以上类型相关的组/行：

df.groupby(level=0)['BusinessType'].transform('nunique')

Index Title
Company1    2
Company1    2
Company2    1
Company3    1
Name: BusinessType, dtype: int64

从此处，我们删除所有与＃个唯一类型相关的== 1的公司。

Answer 2

这是一种方式： -您按static struct device_attribute opalum_dev_attr[]分组 -过滤是否至少有一个Index Title和一个Type 1

Type 2

如果您要查找两种或两种以上的类型（无论它们是1型还是2型），请更新

df = (
    df.groupby('Index Title')
    .filter(lambda x: (x['BusinessType']=='Type 1').any() & 
                      (x['BusinessType']=='Type 2').any())
    .reset_index()
)

在这种情况下，df = ( df.groupby('Index Title') .filter(lambda x: x['BusinessType'].nunique() > 1) .reset_index() )的答案是更干净的答案，您应该使用它。

根据条件熊猫在数据框中删除行

2 个答案: