我的问题与此one类似,但过滤条件不同。
> demo(dadmom,package="tidyr")
> library(tidyr)
> library(dplyr)
> dadmom <- foreign::read.dta("http://www.ats.ucla.edu/stat/stata/modules/dadmomw.dta")
> dadmom %>%
+ gather(key, value, named:incm) %>%
+ separate(key, c("variable", "type"), -2) %>%
+ spread(variable, value, convert = TRUE)
famid type inc name
1 1 d 30000 Bill
2 1 m 15000 Bess
3 2 d 22000 Art
4 2 m 18000 Amy
5 3 d 25000 Paul
6 3 m 50000 Pat
使用&#34; incm&#34;妈妈的收入> 20000可以很容易地找出这个家庭。从原始表:
> dadmom
famid named incd namem incm
1 1 Bill 30000 Bess 15000
2 2 Art 22000 Amy 18000
3 3 Paul 25000 Pat 50000
问题是:你是怎么做的?整理&#34;数据?
答案 0 :(得分:2)
您可以将group_by
和filter
添加到代码
#OP's code
d1 <- dadmom %>%
gather(key, value, named:incm) %>%
separate(key, c("variable", "type"), -2) %>%
spread(variable, value, convert = TRUE)
d1 %>%
group_by(famid) %>%
filter(all(sum(type=='m' & inc > 15000)==sum(type=='m')))
# famid type inc name
# 1 2 d 22000 Art
# 2 2 m 18000 Amy
# 3 3 d 25000 Paul
# 4 3 m 50000 Pat
注意:上面的内容也适用于每个家庭有多个(更多一般)
对于正常情况下单身&m; f&#39;每个家庭配对
d1 %>%
group_by(famid) %>%
filter(any(inc >15000 & type=='m'))
# famid type inc name
#1 2 d 22000 Art
#2 2 m 18000 Amy
#3 3 d 25000 Paul
#4 3 m 50000 Pat
另外,如果您希望使用data.table
,则可以使用devel版本melt
,v1.9.5
可以使用多个值列。它可以从here
library(data.table)
melt(setDT(dadmom), measure.vars=list(c(2,4), c(3,5)),
variable.name='type', value.name=c('name', 'inc'))[,
type:=c('d', 'm')[type]][, .SD[any(type=='m' & inc >15000)] ,famid]
# famid type name inc
#1: 2 d Art 22000
#2: 2 m Amy 18000
#3: 3 d Paul 25000
#4: 3 m Pat 50000