删除两个字符串之间的字符串

时间:2018-08-20 09:24:48

标签: r regex string

假设我有以下向量:

df<- c("@Accessoires A-B [COLL]", "@Accessoires C-D [COLL]",
        "@Components A-D [COLL]","@Components [COLL]", 
        "@Accessoires [COLL]", "@Components H-Z [COLL]")

我想删除存在A-B或C-D等字符串的中间部分。这是一个例子,在我的数据框中,字母组合有很多可能性。

因此所需的输出将是:

"@Accessoires [COLL]" 
"@Accessoires [COLL]" 
"@Components [COLL]"  
"@Components [COLL]"  
"@Accessoires [COLL]" 
"@Components [COLL]" 

我的问题是如何在R中实现这一功能而不必定义所有字母组合?

3 个答案:

答案 0 :(得分:2)

您可以使用sub()和一些正则表达式:

sub("\\s[A-Z]-[A-Z]\\s", " ", df)
[1] "@Accessoires [COLL]" "@Accessoires [COLL]" "@Components [COLL]"  "@Components [COLL]" 
[5] "@Accessoires [COLL]" "@Components [COLL]" 

正则表达式可以归结为:

  • \\s:一个空格
  • [A-Z]:任何(英文)大写字母。

顺便说一句,您的df是矢量,而不是data.frame

df <- c(
  "@Accessoires A-B [COLL]", "@Accessoires C-D [COLL]","@Components A-D [COLL]",
  "@Components [COLL]", "@Accessoires [COLL]","@Components H-Z [COLL]"
)
is.data.frame(df)
[1] FALSE

答案 1 :(得分:1)

在空间上分割,得到第一个和最后一个元素:

sapply(strsplit(df, " "), function(i) paste(head(i, 1), tail(i, 1)))

# [1] "@Accessoires [COLL]" "@Accessoires [COLL]" "@Components [COLL]" 
# [4] "@Components [COLL]"  "@Accessoires [COLL]" "@Components [COLL]" 

答案 2 :(得分:0)

data.frame不是gsub,而是字符向量。 您可以使用gsub(" .* ", " ", df) [1] "@Accessoires [COLL]" "@Accessoires [COLL]" "@Components [COLL]" "@Components [COLL]" "@Accessoires [COLL]" "@Components [COLL]" 删除空格之间的所有内容:

{{1}}

您正在寻找什么吗?