Question

我想忽略R中字符串开头的空格和下划线。

我可以写类似的东西

page = requests.get(url)

但是我认为可能有一个优雅的解决方案

txt <- gsub("^\\s+", "", txt)
txt <- gsub("^\\_+", "", txt)

输出应为txt <- " 9PM 8-Oct-2014_0.335kwh " txt <- gsub("^[\\s+|\\_+]", "", txt) txt。但是我的代码给出了"9PM 8-Oct-2014_0.335kwh "。

我该如何解决？

Answer 1

您只能将\s和下划线捆绑在一个字符类中，然后使用quantifier重复一遍以上。

^[\s_]+

Regex demo

例如：

txt <- gsub("^[\\s_]+", "", txt, perl=TRUE)

或者正如@Tim Biegeleisen在评论中指出的，如果只替换第一个匹配项，则可以使用sub代替：

txt <- sub("[\\s_]+", "", txt, perl=TRUE)

或使用POSIX字符类

txt <- sub("[[:space:]_]+", "", txt)

有关perl=TRUE和regular expressions used in R的更多信息

R demo

Answer 2

您可以将stringr用作：

txt <- " 9PM 8-Oct-2014_0.335kwh "
library(stringr)
str_trim(txt)
[1] "9PM 8-Oct-2014_0.335kwh"

或基本R中的trimws

trimws(txt)
[1] "9PM 8-Oct-2014_0.335kwh"

Answer 3

stringr软件包提供了一些具有实用名称的特定于任务的功能。在最初的问题中，您说您想从字符串的开头删除空格和下划线，但是在注释中，您暗示您也希望从同一字符串的末尾删除相同的字符。为此，我将提供一些不同的选项。

给出字符串s <- " \t_blah_ "，其中包含空格（空格和制表符）和下划线：

library(stringr)

# Remove whitespace and underscores at the start.
str_remove(s, "[\\s_]+")
# [1] "blah_ "

# Remove whitespace and underscores at the start and end.
str_remove_all(s, "[\\s_]+")
# [1] "blah"

如果您要删除仅的空格–毕竟，示例字符串的开头或结尾都没有下划线–有几个stringr函数这将帮助您使事情保持简单：

# `str_trim` trims whitespace (\s and \t) from either or both sides.
str_trim(s, side = "left")
# [1] "_blah_ "

str_trim(s, side = "right")
# [1] "  \t_blah_"

str_trim(s, side = "both") # This is the default.
# [1] "_blah_"

# `str_squish` reduces repeated whitespace anywhere in string. 
s <- "  \t_blah     blah_ "
str_squish(s)
# "_blah blah_"

相同的模式[\\s_]+也可以在基数R的sub或gsub中使用，如果您遇到麻烦，请稍作修改（请参阅Thefourthbird的answer）。

如何仅从字符串的开头抛出空格和下划线？

3 个答案: