Question

我有一个大数据框（tbl_df），其中包含大致以下信息：

data <- data.frame(Energy = sample(1:200, 100, replace = T), strip1 = sample(1:12, 100, replace = T), strip2 = sample(1:12, 100, replace = T))

它有3列。第一个是能量，第二个和第三个是条带数（能量沉积的地方）。

每个条带都有不同的阈值，它们存储在两个数字数组中，数组中的每个位置都用于相应的条带编号：

threshold_strip1 <- c(4, 6, 3, 7, 7, 1, 2, 5, 8, 10, 2, 2)
threshold_strip2 <- c(5, 3, 5, 7, 6, 2, 7, 7, 10, 2, 2, 2)

这些告诉我条带可以接收的最小能量。我想要做的是从数据框中删除BOTH条带没有超过所需阈值的行。

举个例子，如果我有一行：

Energy = 4, strip1 = 2, strip2 = 2

然后我会删除这一行，因为虽然strip2的阈值低于4，但strip1的阈值为6，因此这里没有足够的能量。

道歉，如果这个问题措辞不当，我似乎无法在旧问题中找到类似的东西。

Answer 1

我可能会......

library(data.table)
setDT(data)

# structure lower-bound rules
threshes = list(threshold_strip1, threshold_strip2)
lbDT = data.table(
  strip_loc = rep(seq_along(threshes), lengths(threshes)),
  strip_num = unlist(lapply(threshes, seq_along)),
  thresh    = unlist(threshes)
)

# loop over strip locations (strip1, strip2, etc)
# marking where threshold is not met
data[, keep := TRUE]
lbDT[, {
  onexpr = c(sprintf("strip%s==s", strip_loc), "Energy<th")
  data[.(s = strip_num, th = thresh), on=onexpr, keep := FALSE]
  NULL
}, by=strip_loc]

Answer 2

这个怎么样？使用dplyr：

require(dplyr)

data2 <- data %>%
  mutate(
    strip1_value = threshold_strip1[strip1],
    strip2_value = threshold_strip2[strip2],
    to_keep = Energy > strip1_value & Energy > strip2_value
  ) %>%
  filter(to_keep == TRUE)

Answer 3

getByDataFilter

R - 根据两个值删除数据表行

3 个答案: