Question

我的数据集包含一个包含年份（OldLabel）标签的列，我想创建另一个只包含标签的列，而不是年份（NewLabel）。我编写了以下代码，但它在新标签的末尾留下了一个空格。

data["NewLabel"] <- gsub("20..", "", data$OldLabel)
#removes any part of the OldLabel column that starts with 20 and ends with 2 digits, e.g: 2011 or 2008

有没有办法让gsub用退格替换序列，所以它取消了它取代的年份的任何空格？我尝试使用"\\b"作为我的替换文字，但这只是将其替换为b，而不是退格。

编辑：根据请求，OldLabel的示例为"Valley Summer 2014"，应为"Valley Summer"，但最终为"Valley Summer "，并且当前代码为2012 Valley Summer。但是，有些也可能是md-checkbox的形式，所以我不认为简单地在模式中包含空格就足够强大了。

Answer 1

试试这个：

 data["NewLabel"] <- gsub("[ ]{0,1}20[[:digit:]]{2}[ ]{0,1}", "", data$OldLabel)

成对的curley-braces是重复量词，其范围由一个（精确）或两个（最小和最大）值确定。有关详细信息，请参阅?regex。（您不希望用退格符替换它们。）

test <- c("2012 Valley Summer", "Valley Summer 2014")
gsub("[ ]{0,1}20[[:digit:]]{2}[ ]{0,1}", "", test)
#[1] "Valley Summer" "Valley Summer"

Answer 2

data["NewLabel"] <- gsub("\\s*[0-9]\\s*", "", data$OldLabel)

gsub用退格替换模式

2 个答案: