Question

在R中，我试图将一个分解列压缩成更小的子集，以便稍后我可以一次性引用这些子集。在许多情况下，这些因素的命名需要合并为一个。我已经能够使用以下方法创建我的子集..

fact1 <- as.vector(grep(x=levels(ds$mycol), value=TRUE, pattern="mystring1"))

但是，我需要能够将mystring2的结果排除在因子级别之外。 编辑 - 即，结果有mystring1-mystring2等

以下是一个示例向量

ds$mycol <- as.factor(ds$mycol)
levels(mycol)
 [1] "Level1"  "Level2"  "Level3" "Level1:Level2"                     
 [5] "Level1:Level3"

我想重塑数据，因此只需要3个级别来覆盖现有的5.我希望通过混合:Level2和:Level3因子来统一{{1 }}。问题是使用上述函数将混合因子索引到Level1，Level1和Level2向量。

如何将此额外Level3条件添加到现有!=参数？

我知道有关于此类型问题的信息（使用转义符，|等），但我无法正确应用它。此外，我在Windows 7上使用RStudio，我的具体问题的大多数答案似乎都是针对Unix定制的。

Answer 1

如果我理解正确，你可以使用grepl来选择那些mystring1和NOT那些mystring2的人（在第二个grepl之间的位置？和）。将是无效的匹配）...

ds <- data.frame( mycol = as.factor( c( paste0( "not" , 1:5) , paste0( "keep" , 1:5 ) , paste0( "not-keep" , 1:5 ) ) ) )

fact1 <- grepl(ds$mycol, pattern="keep") & grepl( ds$mycol , pattern = "^((?!not).)*$" , perl = TRUE )

#          mycol
#   1       not1
#   2       not2
#   3       not3
#   4       not4
#   5       not5
#   6      keep1
#   7      keep2
#   8      keep3
#   9      keep4
#   10     keep5
#   11 not-keep1
#   12 not-keep2
#   13 not-keep3
#   14 not-keep4
#   15 not-keep5

fact1
#[1] FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE

使用OP提供的示例

ds <- data.frame( mycol = c("Level1" , "Level2" , "Level3" , "Level1:Level2" , "Level1:Level3" ) )
fact1 <- grepl(ds$mycol, pattern="Level") & grepl( ds$mycol , pattern = "^((?!:).)*$" , perl = TRUE )
ds[ !fact1 , ] <- levels( ds$mycol)[1]

#Or more simple and elegant NOT grepl as in Ricardo's answer
fact1 <- grepl(ds$mycol, pattern="Level") & !grepl( ds$mycol , pattern = ":" )

Answer 2

R具有grepl功能，更适合这种情况。（比将正面/负面的正则表达集合在一起容易得多）

grepl将为您提供逻辑向量，您可以使用&，!和任何其他逻辑组合输出。例如：

  # find rows containing mystring1 and not containing mystring2 
  ds[grepl(mystring1, ds$mycol) & !grepl(mystring2, ds$mycol),  mycol]

相反，grep将为您提供搜索的实际索引。

另外，根据您的描述，听起来您不想要搜索关卡，而是搜索实际值。但在这一点上我可能错了

Answer 3

如何分割:并保留第一个组件

sapply(strsplit(x, ':'), head, n=1)

或使用regex和`gsub1

gsub(':([[:print:]])+', "", x)

这些假设您希望在level

之前保留:的第一个定义

对于更一般的标点字符

sapply(strsplit(x, '[[:punct:]]'), head, n=1)

或

gsub('[[:punct:]]([[:print:]])+', "", x)

使用grep的冷凝因子，但排除某些结果

3 个答案: