我有一个像
这样的数据框SELECT
name,count(CASE WHEN date_part('year',time_stamp) = 2016 THEN answ_count end) AS Year15
FROM
companies companies
where
(CASE when no_answer='f' then value_s IS not NULL or value_n IS not NULL end )
现在我需要找出所有3种类型事件都存在的'cust'。这应该产生
library(data.table)
#create data
A <- data.table(id=rep(1:10, each=40000), date=rep(Sys.Date()-99:0, 4000), ret=rnorm(400000))
B <- data.table(id=rep(1:5, each=10), date=rep(Sys.Date()-99:0), ret=rnorm(50))
#find dates to compare against
n <- NROW(B)
B_long <- B[,.(id = rep(id,each=10),date = rep(date,each=10))]
s <- rep(-10:-1,n)
B_long[,date:=date + s]
#information in one column
B_long$com <- as.numeric(paste0(B_long$id,as.numeric(B$date)))
A$com <- as.numeric(paste0(A$id,as.numeric(A$date)))
#compare
setkey(A,com)
X <- A[com %in% B_long$com,]
不要担心不同(重复可以删除)。 为此,我的方法是
event cust
et1 satya
et1 papu
et1 abc
et1 satya
et1 def
et2 papu
et2 satya
et2 panda
et3 normal
et3 panda
et3 satya
et3 fgh
但是在这种情况下,当DataFrame大小很大并且我必须找到大约50-100个事件的普通cust时,这是不合适的。
请建议一些大熊猫/更多pythonic方式。提前谢谢。
答案 0 :(得分:1)
您可以尝试:
#first drop duplicates in each group by event
df = df.drop_duplicates(['event','cust'])
#count values
counts = df.cust.value_counts()
print counts
satya 3
panda 2
papu 2
def 1
normal 1
fgh 1
abc 1
Name: cust, dtype: int64
#get number of unique events
uniqevents = df.event.nunique()
print uniqevents
3
#get values with count == uniqevents
counts = counts[counts == uniqevents]
print counts
satya 3
Name: cust, dtype: int64
print counts.index.to_series().reset_index(drop=True)
0 satya
dtype: object