Question

我有Identifier列，其中包含字符值。

structure(list(Identifier = c("RL.K", "RL.K.1", "RL.K.2", "RL.K.3", 
"RL.K.4", "RL.K.5", "RL.K.6", "RL.K.7", "RL.K.9", "RL.K.10", 
"RI.K", "RI.K.1", "RI.K.2", "RI.K.3", "RI.K.4", "RI.K.5", "RI.K.6", 
"RI.K.7", "RI.K.9", "RI.K.10", "RF.K", "RF.K.1")), row.names = c(NA, 
-22L), class = c("tbl_df", "tbl", "data.frame"))

如何仅用一个周期过滤掉值？这样我就可以取出第1、11和21行

Answer 1

使用基数R的解决方案（找到所有带一个点的字符串）

grepl("^[^.]*[.][^.]*$", df1$Identifier)

要删除带有一个点的行，请使用：

df1[
!grepl("^[^.]*[.][^.]*$", df1$Identifier),
]

Answer 2

如果我们要使用base和grepl，则有一个更简单的正则表达式代码：

command

（正则表达式的解释：@canada.command(pass_context=True, name='bob') async def canada_bob(self, ctx): await self.bot.say("Pong".format(ctx.message)) @denmark.command(pass_context=True, name='bob') async def denamrk_bob(self, ctx): await self.bot.say("Pong".format(ctx.message))找到一个文字。df[grepl("\\..*\\.", df$Identifier),]找到任何东西，因此此代码查找其中两个文字点之间被任何东西分隔的情况）

Answer 3

我们可以计算“标识符”中.的数量，并为filter行划分创建逻辑条件

library(tidyverse)
df1 %>% 
   filter(str_count(Identifier, "[.]") == 1)
# A tibble: 3 x 1
#  Identifier
#  <chr>     
#1 RL.K      
#2 RI.K      
#3 RF.K

或者如@WiktorStribizew所述，fixed可以包装起来使其变得更快

df1 %>% 
   filter(str_count(Identifier, fixed(".")) == 1)

或者不使用任何外部库，

df1[nchar(gsub("[^.]*", "", df1$Identifier)) == 1,]

或使用gregexpr中的base R

df1[lengths(gregexpr(".", df1$Identifier, fixed = TRUE)) == 1,]

Answer 4

使用尽可能少的正则表达式;）：

has.only.one.dot <- function(str_vec) sapply(strsplit(str_vec, "\\."), function(vec) length(vec) == 2)
df[!has.only.one.dot(df$Identifier), ]

但是，列表函数sapply和strsplit比正则表达式解决方案慢。

has.only.one.dot <- function(str_vec) grepl("\\.", str_vec) & ! grepl("\\..*\\.", str_vec)
df[!has.only.one.dot(df$Identifier), ]

过滤掉R中只有一个句点的所有行

4 个答案: