Question

想象一下以下Python Pandas Dataframe：

df = pd.DataFrame({'id' : ['foo', 'bar', 'foo'], \
                   'A' : ['property1', 'property1', 'property2'], \
                   'B' : ['test', 'test', 'test'] })
from tabulate import tabulate
print(tabulate(df, headers='keys', tablefmt='psql'))

+----+-----------+------+------+
|    | A         | B    | id   |
|----+-----------+------+------|
|  0 | property1 | test | foo  |
|  1 | property1 | test | bar  |
|  2 | property2 | test | foo  |
+----+-----------+------+------+

您可以在此处看到，对于ID “foo”，列B只有一个唯一（不同）值，即测试。但对于A列，它有两个不同的值 property1 和 property2 。对于id “bar”，两列只有一个不同的值。

我正在寻找的是代码，如果按ID分组，则会为我提供计数大于1的列的名称。所以结果应该是列A的名称，因为它是保持非不同的值。

df.groupby(['id'])

我只知道如何获得计数（出现次数）大于1的ID。但这不是我最终要找的。

df['id'].value_counts().reset_index(name="count").query("count > 1")["id"]

感谢任何提示。

Answer 1

使用：

#filter column of interest
a = (df.groupby(['id'])['A','B'].nunique() > 1).any()

print (a)
A     True
B    False
dtype: bool

#if need test all columns without id
a = (df.set_index('id').groupby('id').nunique() > 1).any()
print (a)
A     True
B    False
dtype: bool

上次过滤：

b = a.index[a]
print (b)
Index(['A'], dtype='object')

Answer 2

也许你正在寻找：

g = df.groupby('id')['A', 'B'].nunique()
g

     A  B
id       
bar  1  1
foo  2  1

要获取相关列，只需索引到df.columns：

df.columns[(g > 1).any()]
Index(['A'], dtype='object')

Answer 3

<强>更新

type

# app/models/affiliate.rb
class Affiliate < User
    has_many :customers, through: :referrals, foreign_key: :user_id
end

# app/models/customer.rb
class Customer < User
    has_one :affiliate, through: :referral, foreign_key: :user_id
end

说明：

In [98]: df.columns.drop('id')[(df.groupby('id')[df.columns.drop('id')].nunique() > 1).any()]
Out[98]: Index(['A'], dtype='object')

Answer 4

这是另一种方式

self.market.sort {
    if let v1 = $0.volume {
        if let v2 = $1.volume {
            // Both values exist
            return v1 > v2
        } else {
            // there is no v2 - treat v1 as < v2
            return false
        }
    } else {
        return true
    }
}

或类似

pd.crosstab(df.id,[df.A,df.B],margins =True)
Out[206]: 
A   property1 property2 All
B        test      test    
id                         
bar         1         0   1
foo         1         1   2
All         2         1   3

获取具有大于指定值

4 个答案: