如何根据初始条件计算在定义的行范围内符合条件的行数?

时间:2018-01-21 18:39:55

标签: r dplyr

说我有一些看起来有点像这样的数据

library(dplyr)

employee <- employee <- c('John','Dave','Paul','Ringo','George','Tom','Jim','Harry','Jamie','Adrian')
quality <- c('good', 'bad')
x = runif(4000,0,100)
y = runif(4000,0,100)
employ.data <- data.frame(employee, quality, x, y)

我想设置第一个标准(即,employee ='George'和quality ='good'的任何行),然后计算该标准的某个范围内的行数(比如说五行)匹配第二个标准(即员工='Jim'和x =&gt; 50的任何行)。我怎么能在R中这样做?

希望这很清楚。谢谢!

2 个答案:

答案 0 :(得分:1)

以下是您在问题中提到的标准的示例。

作为对评论中要求的内容的编辑,我会将其包装在一个函数中,以便将其应用于员工变量的所有级别

 criterion_range <- function(n, group) {
  # n: the number of rows after the first criterion
  # group: the employee you want to include in the first criterion
  n = n

  # index for the first criterion:
  ind1 <- which(employ.data$employee == group & employ.data$quality == 
                  "bad")

  if(length(ind1) > 0) {

  # index for all the next n rows following and n rows preceding rows with 
  # criterion 1:
  ind_n <- c(t(sapply(-n:n, function(x) {ind1 + x})))

  # to make sure that the index does not go beyond the rows in the sample:
  ind_n <- ind_n[ind_n <= nrow(employ.data) & ind_n > 0]

  # index of the rows that fall within that range of ind_n and match a second  
  # criterion
  ind2 <- which(employ.data[ind_n,"employee"] == "Jim" & 
                  employ.data[ind_n,"x"] > 60)

  return(nrow(employ.data[ind2,]))
  }
}

# this will give run the function for each employee in your df, you can 
# specify n here
unlist(sapply(levels(employ.data$employee), criterion_range, n = 3))

答案 1 :(得分:1)

您可以通过调整lower_boundupper_bound的值来调整您想要查看第一个条件的前后匹配的行数。

library(dplyr)

# Generate employee data
employee <- c('John','Dave','Paul','Ringo','George','Tom','Jim','Harry','Jamie','Adrian')
quality <- c('good', 'bad')
x = runif(4000,0,100)
y = runif(4000,0,100)
employ.data <- data.frame(employee, quality, x, y)

# Extract row numbers that satisfy criteria 1
criteria1 <- which(employ.data$employee == "George" & employ.data$quality == "good")

# Set lower bounds for rows that satisfy criteria 1
lower_bound <- 5

lower <- criteria1 - lower_bound
lower <- ifelse(lower <= 0, 1, lower)

# Ser upper bounds for rows that satisfy criteria 1
upper_bound <- 5

upper <- criteria1 + upper_bound
upper <- ifelse(upper > 4000, 4000, upper)

# Create a list that contains the appropriate range for each row that satisfies criteria 1
range <- rbind(lower, upper)

# Combine all ranges to form a vector of unique row numbers
rows <- unique(unlist(apply(range, 2, function(r) r[1]:r[2])))

# Find how many rows in the extended range satisfy criteria 2
criteria2 <- nrow(employ.data[rows,][employ.data[rows,]$employee == "Jim" & employ.data[rows,]$x >= 50,])

print(criteria2)