Question

我有一个关于删除列数据框内字符文本中的空格的问题。这是我的数据框专栏：

head(data$HO)
[1] "Lidar; Wind field; Temperature; Aerosol; Fabry-Perot etalon"                             
[2] "Compressive ghost imaging; Guided filter; Single-pixel imaging"

这个问题与link不同，因为我只想删除符号“;”后面的空格。，所以输出应该如下所示：

head(data$HO)
[1] "Lidar;Wind field;Temperature;Aerosol;Fabry-Perot etalon"                             
[2] "Compressive ghost imaging;Guided filter;Single-pixel imaging"

我试过了

data$HO <- gsub("\\;s", ";",data$HO)

但它不起作用。

有什么建议吗？

Answer 1

您可以使用;\s+模式并替换为;：

> x <- c("Lidar; Wind field; Temperature; Aerosol; Fabry-Perot etalon", "Compressive ghost imaging; Guided filter; Single-pixel imaging")
> gsub(";\\s+", ";", x)
[1] "Lidar;Wind field;Temperature;Aerosol;Fabry-Perot etalon"     
[2] "Compressive ghost imaging;Guided filter;Single-pixel imaging"

模式细节：

; - 分号
\s+ - 一个或多个空白字符。

请参阅regex demo。

解决方案的更多变体：

gsub("(*UCP);\\K\\s+", "", x, perl=TRUE)
gsub(";[[:space:]]+", ";", x)

Answer 2

另一种可能的解决方案可能是使用look-behind ?<=令牌。只需检查;后面的\s+并将空格替换为空。

v <- c("Lidar; Wind field; Temperature; Aerosol; Fabry-Perot etalon", 
      "Compressive ghost imaging; Guided filter; Single-pixel imaging")

gsub("(?<=;)\\s+", "", v, perl = TRUE)

# Result:
# [1] "Lidar;Wind field;Temperature;Aerosol;Fabry-Perot etalon"     
# [2] "Compressive ghost imaging;Guided filter;Single-pixel imaging"

删除特定符号后面的空格“;”

2 个答案: