Question

您好我有一个数据集d1。

import pandas as pd
d1 = {
'customers': pd.Series([1, 1, 1, 2, 2, 3, 3, 4, 4]),
'channel': pd.Series(['a', 'a', 'b', 'c', 'a', 'a', 'b', 'b', 'c']),
'freq': pd.Series([3, 3, 3, 2, 2, 2, 2, 2, 2])
}
d1=pd.DataFrame(d1)

我想获得仅使用两个不同频道和频道的客户列表＆＃39; a＆＃39;是强制性的。

对于前..第一位客户使用了两个截然不同的渠道＆＃39; a＆＃39;＆amp; ＆＃39; B＆＃39;
第二位客户使用过＆＃39; a＆＃39; ＆安培; ＆＃39; C＆＃39;第三位客户使用了＆＃39; a＆＃39; ＆安培; ＆＃39; B＆＃39 ;.但是客户4没有使用频道＆＃39; a＆＃39;等等....

提前致谢

Answer 1

这有点令人费力但基本上我们使用groupby，过滤和2级布尔索引对df执行乘法过滤：

In [140]:
    d1[d1.customers.isin(d1[d1.channel=='a'].customers)].groupby('customers').filter(lambda x: x['channel'].nunique() == 2)
Out[140]:
  channel  customers  freq
0       a          1     3
1       a          1     3
2       b          1     3
3       c          2     2
4       a          2     2
5       a          3     2
6       b          3     2

打破这个局面：

In [141]:
# filter out just those that have channel a
d1[d1.channel=='a']
Out[141]:
  channel  customers  freq
0       a          1     3
1       a          1     3
4       a          2     2
5       a          3     2
In [144]:
# we want these customer ids
d1[d1.channel=='a'].customers
Out[144]:
0    1
1    1
4    2
5    3
Name: customers, dtype: int64
In [146]:
# perform an additional filtering so we only want customers who have channel a
d1[d1.customers.isin(d1[d1.channel=='a'].customers)]
Out[146]:
  channel  customers  freq
0       a          1     3
1       a          1     3
2       b          1     3
3       c          2     2
4       a          2     2
5       a          3     2
6       b          3     2

然后将以上内容分组给客户，然后我们可以应用过滤器，其中唯一（唯一）客户的数量等于2

Answer 2

要慢一点，例如如果你打算进一步使过滤逻辑复杂化，这是另一种（并且不太优雅）的方法，它试图不是一个单行：

def func(x):
    vals = x['channel'].value_counts()
    if 'a' in vals and len(vals) == 2:
        return True
    return False

mask = d1.groupby('customers').apply(func)
print mask

输出：

customers
1             True
2             True
3             True
4            False
dtype: bool

现在它取决于您希望输出的方式：

# Get a list of customers, like
# [1, 2, 3]
target_customers = [k for k, v in mask.to_dict().iteritems() if v]

# Get a slice of the original DataFrame
print d1[d1['customers'].isin(target_customers)]

如何根据两个条件提取行

2 个答案: