Question

我想从具有500个列名的向量中删除大约100个条目，然后使用该向量将（预测）矩阵m的行设置为零。

作为我的数据框的一个非常简单的示例：

首先，我将列名放入向量中：

x <- colnames(df) # x <- c("A","B","C","D","E","F","G,"H","I","J")

假设我要删除B，直到D，F和G直到我（实际上是散布在向量上的大约100个变量，但我不知道它们的索引）。我想做类似的事情：

*remove <- c(B:D, F, G:I)* # This does now work obviously
x [! x %in% remove]

哪个会给我一个向量x，如下所示：

A
E
J

此向量代表需要设置为零的行名（和colname，因为它是一个预测矩阵）：

m[x,] <- 0

创建以下输出：

  A B C D E F G H
A 1 0 1 0 1 0 1 0
B 0 0 0 0 0 0 0 0
C 0 0 0 0 0 0 0 0
D 0 0 0 0 0 0 0 0
E 1 0 1 0 1 0 1 0
F 1 0 1 0 1 0 1 0
G 0 0 0 0 0 0 0 0
H 0 0 0 0 0 0 0 0
I 0 0 0 0 0 0 0 0
J 1 0 1 0 1 0 1 0

如何从所有变量名称的向量中删除这100个变量名称，并使用该向量来引用矩阵的列名称？

Answer 1

有趣的用例。我们可以设计一个函数，以您希望的通用方式帮助您完成此任务。

注意：

我在b / c下使用了一个数据框，我不认为最初会提到矩阵（或者我只是想念它），现在各种问题编辑使列和行名变得混乱。 SO 您应该从下面重点关注的是：

# get the terms of the formula
trms <- terms(remove_spec)

# get each element (will be each group separated by `+`
elements <- attr(trms, "term.labels")

# adding in assertions to validate `col` is in `xdf` and that only
# the restricted syntax is used in the formula and that it's valid 
# is up to the OP

# now, find the positions of all those strings
unlist(lapply(elements, function(y) {
  if (grepl(":", y)) {
    rng <- strsplit(y, ":")[[1]]
    which(x[,col] == rng[1]) : which(x[,col] == rng[2])
  } else {
    which(x[,col] == y)
  }
}), use.names = FALSE) -> to_exclude

我现在已经用这个q完成了（行名是1980年代：-）。请注意答案结尾处的警告。

其他人应该在OP的用例的实际矩阵答案中随意使用它。

我们将制作一些模拟数据（这样，如果您需要更大的示例，则可以使示例更大）：

library(dplyr) # mostly for saner data frame constructor & printing

set.seed(2018-11-18)

data_frame(
  cat = LETTERS,
  val1 = sample(100, length(cat), replace = TRUE),
  val2 = sample(100, length(cat), replace = TRUE),
  val3 = sample(100, length(cat), replace = TRUE)
) -> xdf

xdf
## # A tibble: 26 x 4
##    cat    val1  val2  val3
##    <chr> <int> <int> <int>
##  1 A        87    98     5
##  2 B        30    69    39
##  3 C        87     1    32
##  4 D        65    46    87
##  5 E         4    69     6
##  6 F        53    20    31
##  7 G        43    51    84
##  8 H        27    43    65
##  9 I        27     9    10
## 10 J        10    94    11
## # ... with 16 more rows

（{tibble打印是def >>基本打印IMO，但我离题了）。

现在，您想使用字符串来指定单个元素和范围，并使用某些内容来说明如何进行隐藏。我们需要一个用于的函数，并且我们可以利用特殊的R类forumla来帮助使用更紧凑的语法。即能够调用这样的函数不是很好：

remove_rows(xdf, cat, ~B:C+F+G:I)

，它将在B的{{1}}列中寻找“ C”：“ cat”的范围，找到位置“ xdf” ”，然后是“ F”：“ G”的范围，并返回排除了这些数据的数据帧？是的，是的。所以，让我们来构建它！

现在我们可以将其称为真实货币了。

#' @param x data frame
#' @param col bare column name to use for the comparison
#' @param formula restricted operators are `:` for range and `+` for additing selectors
remove_rows <- function(x, col, remove_spec) {

  # this is pure convenience we could just as easily have forced folks 
  # to pass in a string (and we can modify it to handle both)
  col <- as.character(substitute(col)) 

  # get the terms of the formula
  trms <- terms(remove_spec)

  # get each element (will be each group separated by `+`
  elements <- attr(trms, "term.labels")

  # adding in assertions to validate `col` is in `xdf` and that only
  # the restricted syntax is used in the formula and that it's valid 
  # is up to the OP

  # now, find the positions of all those strings
  unlist(lapply(elements, function(y) {
    if (grepl(":", y)) {
      rng <- strsplit(y, ":")[[1]]
      which(x[,col] == rng[1]) : which(x[,col] == rng[2])
    } else {
      which(x[,col] == y)
    }
  }), use.names = FALSE) -> to_exclude

  # and get rid of those puppies
  x[-to_exclude,]

}

该函数命名不正确，因此您可能需要更改它，并且确实应该添加一些参数检查和验证，但是我相信这可以满足您的要求（假设您真的确定数据框架按照您认为的顺序排列。）

此外，这是不完善的，因为字符串被约束为公式（所述约束之一是，如果没有反引号，则它们不能以数字开头）。但是，您没有提供真实字符串的样本。

Answer 2

我使用hrbrmstr的答案和很长的解决方法来工作。如果有人可以告诉我如何减少混乱，请让我知道。

# Copy prediction matrix and turn it into a dataframe for the "remove rows" function
varlist <- m
varlist <- as.data.frame(varlist)

# Create a column called "cat" with the rownames for the "remove rows" function
varlist$cat = rownames(varlist)
# Use the function to remove the rows from the copied df
varlist <- remove_rows(varlist, cat, ~B:C+F+G:I)
# Only keep the "cat" column and turn it into a vector
varlist <- varlist$cat
varlist <- varlist[['cat']]
# Copy prediction matrix and use "varlist" to put the correct rows to zero.
m_reduced <- m
m_reduced[ ,varlist] <- 0

如果有人能告诉我如何清理这种怪兽，我会非常高兴。

Answer 3

这是我的方式：

remove<-function(lets_to_be_removed,names){
    letters_with_names<-1:length(LETTERS) # each value corresponds to a letter
    names(letters_with_names)<-LETTERS # the letters, for example: letters_with_name["A"]==1 is TRUE
    result<-integer()
    for(letters in lets_to_be_removed){
        #check if it is only one letter
        res <- if(length(letters) == 1) letters_with_names[letters] else letters_with_names[letters[1]]:letters_with_names[letters[2]] 
        result<- c(result,res)
    }
    names(result)<-LETTERS[result]
    result #return the indices of the letters
}

您可以通过以下方式调用它：

letters <- list(c("B","D"),"F",c("G","I"))
letters
[[1]]
[1] "B" "D" # B:D sequence
[[2]]
[1] "F" # only one letter
[[3]]
[1] "G" "I" # G:I sequence

indices<-remove(letters,x)
indices # named vector
B C D F G H I 
2 3 4 6 7 8 9

x[ -indices ] # it is faster than [! x %in% indices] but if you want your method  then use [! x %in% names(indices)]
[1] "A" "E" "J"

通常，用于索引整数比使用字符更好和更快。

按名称从字符串向量中删除许多条目

3 个答案:

注意：