Question

我有一个像

这样的数据框

SELECT
    name,count(CASE WHEN date_part('year',time_stamp) = 2016 THEN answ_count end) AS Year15 
FROM 
        companies companies 
where 

    (CASE when no_answer='f' then value_s IS  not  NULL or value_n IS  not  NULL end )

现在我需要找出所有3种类型事件都存在的'cust'。这应该产生

library(data.table)
#create data
A  <- data.table(id=rep(1:10, each=40000), date=rep(Sys.Date()-99:0,  4000), ret=rnorm(400000))
B  <- data.table(id=rep(1:5,  each=10), date=rep(Sys.Date()-99:0),  ret=rnorm(50))

#find dates to compare against
n <- NROW(B)
B_long <- B[,.(id = rep(id,each=10),date = rep(date,each=10))]
s <- rep(-10:-1,n)
B_long[,date:=date + s]

#information in one column
B_long$com <- as.numeric(paste0(B_long$id,as.numeric(B$date)))
A$com <- as.numeric(paste0(A$id,as.numeric(A$date)))

#compare
setkey(A,com)
X <- A[com %in% B_long$com,]

不要担心不同（重复可以删除）。为此，我的方法是

event    cust
 et1   satya
 et1    papu
 et1     abc
 et1   satya
 et1     def
 et2    papu
 et2   satya
 et2   panda
 et3  normal
 et3   panda
 et3   satya
 et3     fgh

但是在这种情况下，当DataFrame大小很大并且我必须找到大约50-100个事件的普通cust时，这是不合适的。

请建议一些大熊猫/更多pythonic方式。提前谢谢。

Answer 1

您可以尝试：

#first drop duplicates in each group by event
df = df.drop_duplicates(['event','cust'])

#count  values
counts = df.cust.value_counts()
print counts
satya     3
panda     2
papu      2
def       1
normal    1
fgh       1
abc       1
Name: cust, dtype: int64

#get number of unique events
uniqevents = df.event.nunique()
print uniqevents
3
#get values with count == uniqevents
counts = counts[counts == uniqevents]
print counts
satya    3
Name: cust, dtype: int64

print counts.index.to_series().reset_index(drop=True)
0    satya
dtype: object

在pandas中的另一列中查找一列中的常用值与不同的值

1 个答案: