说我有一些看起来有点像这样的数据
library(dplyr)
employee <- employee <- c('John','Dave','Paul','Ringo','George','Tom','Jim','Harry','Jamie','Adrian')
quality <- c('good', 'bad')
x = runif(4000,0,100)
y = runif(4000,0,100)
employ.data <- data.frame(employee, quality, x, y)
我想设置第一个标准(即,employee ='George'和quality ='good'的任何行),然后计算该标准的某个范围内的行数(比如说五行)匹配第二个标准(即员工='Jim'和x =&gt; 50的任何行)。我怎么能在R中这样做?
希望这很清楚。谢谢!
答案 0 :(得分:1)
以下是您在问题中提到的标准的示例。
作为对评论中要求的内容的编辑,我会将其包装在一个函数中,以便将其应用于员工变量的所有级别
criterion_range <- function(n, group) {
# n: the number of rows after the first criterion
# group: the employee you want to include in the first criterion
n = n
# index for the first criterion:
ind1 <- which(employ.data$employee == group & employ.data$quality ==
"bad")
if(length(ind1) > 0) {
# index for all the next n rows following and n rows preceding rows with
# criterion 1:
ind_n <- c(t(sapply(-n:n, function(x) {ind1 + x})))
# to make sure that the index does not go beyond the rows in the sample:
ind_n <- ind_n[ind_n <= nrow(employ.data) & ind_n > 0]
# index of the rows that fall within that range of ind_n and match a second
# criterion
ind2 <- which(employ.data[ind_n,"employee"] == "Jim" &
employ.data[ind_n,"x"] > 60)
return(nrow(employ.data[ind2,]))
}
}
# this will give run the function for each employee in your df, you can
# specify n here
unlist(sapply(levels(employ.data$employee), criterion_range, n = 3))
答案 1 :(得分:1)
您可以通过调整lower_bound
和upper_bound
的值来调整您想要查看第一个条件的前后匹配的行数。
library(dplyr)
# Generate employee data
employee <- c('John','Dave','Paul','Ringo','George','Tom','Jim','Harry','Jamie','Adrian')
quality <- c('good', 'bad')
x = runif(4000,0,100)
y = runif(4000,0,100)
employ.data <- data.frame(employee, quality, x, y)
# Extract row numbers that satisfy criteria 1
criteria1 <- which(employ.data$employee == "George" & employ.data$quality == "good")
# Set lower bounds for rows that satisfy criteria 1
lower_bound <- 5
lower <- criteria1 - lower_bound
lower <- ifelse(lower <= 0, 1, lower)
# Ser upper bounds for rows that satisfy criteria 1
upper_bound <- 5
upper <- criteria1 + upper_bound
upper <- ifelse(upper > 4000, 4000, upper)
# Create a list that contains the appropriate range for each row that satisfies criteria 1
range <- rbind(lower, upper)
# Combine all ranges to form a vector of unique row numbers
rows <- unique(unlist(apply(range, 2, function(r) r[1]:r[2])))
# Find how many rows in the extended range satisfy criteria 2
criteria2 <- nrow(employ.data[rows,][employ.data[rows,]$employee == "Jim" & employ.data[rows,]$x >= 50,])
print(criteria2)