我有一个由
组成的数据库blk_h
,blk_w
),Flow
),Med_C
)和CumFlow
)。数据按blk_h
和blk_w
(降序)之间的距离排序,按id_h
分组。我需要对数据进行子集化,以便为CumFlow
FIRST等于或超过Med_C
的每个家庭邻域提取案例。
我尝试了各种dplyr功能,无法让它工作。这是一个例子:
df <- data.frame(
id_h=c("A","A","A","A","B","B","B"),
blk_h=c("A1","A1","A2","A2","B1","B2","B2"),
blk_w=c("W1","W2","W3","W3","W1","W2","W2"),
dist=c(4.3,5.6,7.0,8.7,5.2,6.5,6.8),
Flow=c(3,6,3,7,5,4,2),
CumFlow=c(3,9,12,19,5,9,11),
Med_C=c(10,10,10,10,6,6,6)
)
df
我需要这样返回一个这样的表:
id_h blk_h blk_w dist Flow CumFlow Med_C
A A2 W3 7.0 3 12 10
B B2 W2 6.5 4 9 6
以下是我试图让这件事发生的一些事情: 尝试#1
library(dplyr)
df.g <- group_by(df, id_h)
df.g2 <- filter(df.g, CumFlow == which.min(CumFlow >= Med_C))
尝试#2
library(data.table)
setDT(df)[, .SD[which.min(CumCount >= Med_C)], by = id_h]
尝试#3
library(dplyr)
test <- df %>% group_by(id_h) %>% filter(min(CumFlow) >= Med_C)
我认为我误解了如何使用which.min
功能。任何意见是极大的赞赏。
答案 0 :(得分:3)
两件事:
slice
(取一个索引)而不是filter
(需要布尔值),which.min
的使用是奇数(它返回第一个值的索引等于最小值,并且你有很多1和0),你实际上需要which.max
,因为你想要1
的第一个值,即TRUE
,所以
df %>% group_by(id_h) %>%
slice(which.max(CumFlow >= Med_C))
## Source: local data frame [2 x 7]
## Groups: id_h [2]
##
## id_h blk_h blk_w dist Flow CumFlow Med_C
## <fctr> <fctr> <fctr> <dbl> <dbl> <dbl> <dbl>
## 1 A A2 W3 7.0 3 12 10
## 2 B B2 W2 6.5 4 9 6
答案 1 :(得分:2)
你可以像这样使用dplyr
df %>% group_by(id_h) %>%
mutate(times_greater = cumsum(CumFlow >= Med_C)) %>%
filter(times_greater == 1)
答案 2 :(得分:2)
# Load package
library(data.table)
# Setup data
df <- data.table(
id_h=c("A","A","A","A","B","B","B"),
blk_h=c("A1","A1","A2","A2","B1","B2","B2"),
blk_w=c("W1","W2","W3","W3","W1","W2","W2"),
dist=c(4.3,5.6,7.0,8.7,5.2,6.5,6.8),
Flow=c(3,6,3,7,5,4,2),
CumFlow=c(3,9,12,19,5,9,11),
Med_C=c(10,10,10,10,6,6,6))
# Calculation
df.out <- df[CumFlow >= Med_C, .SD[1], by = id_h]
解决方案如下所示:
df.out
> df.out
id_h blk_h blk_w dist Flow CumFlow Med_C
1: A A2 W3 7.0 3 12 10
2: B B2 W2 6.5 4 9 6
看起来像这样:
{{1}}
答案 3 :(得分:1)
两个filter
次来电可以解决这个问题。
使用group_by
在每个id_h
内工作,第一个filter
返回data.frame
,其中CumFlow
大于或等于Med_C
。第二个filter
在每个id_h
内返回CumFlow
最低的行。这仅适用,因为数据已排序。为了使工作更加强大,您可以考虑在致电arrange
后向group_by
添加电话。
library(dplyr)
df <- data.frame(
id_h = c("A","A","A","A","B","B","B"),
blk_h = c("A1","A1","A2","A2","B1","B2","B2"),
blk_w = c("W1","W2","W3","W3","W1","W2","W2"),
dist = c(4.3,5.6,7.0,8.7,5.2,6.5,6.8),
Flow = c(3,6,3,7,5,4,2),
CumFlow = c(3,9,12,19,5,9,11),
Med_C = c(10,10,10,10,6,6,6)
)
df
df %>%
group_by(id_h) %>%
filter(CumFlow >= Med_C) %>%
filter(CumFlow == min(CumFlow))