有几篇文章涵盖了类似的问题:
Remove square brackets from a string vector
...但是regex太难了,我似乎无法得到我想要的任何东西。
我已经从html复制并粘贴了一个大表,它的结构很好。一栏中有一些拖尾的文物。
以下是一些示例数据:
df <- structure(list(From = c("3 February 2015[N 4]", "23 February 2017[N 3]",
"17 March 2010[N 1]", "22 July 2016[N 2]", "14 May 1986", "22 February 1995",
"8 June 1995", "12 August 1996"), Until = c("4 November 2015",
"17 October 2017", "9 May 2010", "3 January 2017", "21 February 1995",
"8 June 1995", "12 August 1996", "13 September 1996")), class = c("spec_tbl_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -8L), spec = structure(list(
cols = list(Name = structure(list(), class = c("collector_character",
"collector")), Nat. = structure(list(), class = c("collector_logical",
"collector")), Club = structure(list(), class = c("collector_character",
"collector")), From = structure(list(), class = c("collector_character",
"collector")), Until = structure(list(), class = c("collector_character",
"collector")), `Duration
(days)` = structure(list(), class = c("collector_double",
"collector")), `Years in
League` = structure(list(), class = c("collector_character",
"collector")), Ref. = structure(list(), class = c("collector_character",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1), class = "col_spec"))
文物采用方括号的格式,其中带有字母和数字。 [N1]
。
当我解析为日期列Until
时,效果很好:
library(lubridate)
df %>%
mutate(Until = dmy(Until))
但是伪造奇数的列From
无法解析这些条目:
df %>%
mutate(From = dmy(From))
我先尝试使用纯文本gsub
,甚至一次尝试过一次:
gsub("[N1]", "", df$From)
...但是伪影条目以外的列中的文本被弄乱了-我想是由于方括号。
然后我尝试了正则表达式,但无法使其正常工作:
gsub("\\[.*?\]/", "", df$From)
gsub("\\[N\d\\]", "", df$From)
都给出相同的内容:Error: '\]' is an unrecognized escape in character string starting
我真的不介意解决方案是gsub
中的str_replace_all
还是tidyverse
,我只需要删除/替换[N1]
,[N2]
并等等。