假设我有以下向量:
df<- c("@Accessoires A-B [COLL]", "@Accessoires C-D [COLL]",
"@Components A-D [COLL]","@Components [COLL]",
"@Accessoires [COLL]", "@Components H-Z [COLL]")
我想删除存在A-B或C-D等字符串的中间部分。这是一个例子,在我的数据框中,字母组合有很多可能性。
因此所需的输出将是:
"@Accessoires [COLL]"
"@Accessoires [COLL]"
"@Components [COLL]"
"@Components [COLL]"
"@Accessoires [COLL]"
"@Components [COLL]"
我的问题是如何在R中实现这一功能而不必定义所有字母组合?
答案 0 :(得分:2)
您可以使用sub()
和一些正则表达式:
sub("\\s[A-Z]-[A-Z]\\s", " ", df)
[1] "@Accessoires [COLL]" "@Accessoires [COLL]" "@Components [COLL]" "@Components [COLL]"
[5] "@Accessoires [COLL]" "@Components [COLL]"
正则表达式可以归结为:
\\s
:一个空格[A-Z]
:任何(英文)大写字母。顺便说一句,您的df
是矢量,而不是data.frame
df <- c(
"@Accessoires A-B [COLL]", "@Accessoires C-D [COLL]","@Components A-D [COLL]",
"@Components [COLL]", "@Accessoires [COLL]","@Components H-Z [COLL]"
)
is.data.frame(df)
[1] FALSE
答案 1 :(得分:1)
在空间上分割,得到第一个和最后一个元素:
sapply(strsplit(df, " "), function(i) paste(head(i, 1), tail(i, 1)))
# [1] "@Accessoires [COLL]" "@Accessoires [COLL]" "@Components [COLL]"
# [4] "@Components [COLL]" "@Accessoires [COLL]" "@Components [COLL]"
答案 2 :(得分:0)
data.frame
不是gsub
,而是字符向量。
您可以使用gsub(" .* ", " ", df)
[1] "@Accessoires [COLL]" "@Accessoires [COLL]" "@Components [COLL]" "@Components [COLL]" "@Accessoires [COLL]" "@Components [COLL]"
删除空格之间的所有内容:
{{1}}
您正在寻找什么吗?