我有一个很长的字符串,我想在其中更改单词顺序。我想使用正则表达式,因为我有多个元素要改变,我想同时学习。以下是我的字符串示例:
vec1 <- c("Internet-Devices Used to Access Internet Past 30 Days [Desktop Computer-Owned by Self]",
"Internet-Devices Used to Access Internet Past 30 Days [Tablet-Owned by Other HH Member]",
"Internet-Devices Used to Access Internet Past 30 Days [Laptop Computer-Made Available by Your Employer]",
"Radio Stations-Listened to Past Week-Quebec City [FM-CFEL-102.1 (blvd 102.1)]")
vec1
[1] "Internet-Devices Used to Access Internet Past 30 Days [Desktop Computer-Owned by Self]"
[2] "Internet-Devices Used to Access Internet Past 30 Days [Tablet-Owned by Other HH Member]"
[3] "Internet-Devices Used to Access Internet Past 30 Days [Laptop Computer-Made Available by Your Employer]"
[4] "Radio Stations-Listened to Past Week-Quebec City [FM-CFEL-102.1 (blvd 102.1)]"
我希望它成为:
[1] "Internet-Devices Used to Access Internet Past 30 Days -Owned by Self[Desktop Computer]"
[2] "Internet-Devices Used to Access Internet Past 30 Days -Owned by Other HH Member[Tablet]"
[3] "Internet-Devices Used to Access Internet Past 30 Days -Made Available by Your Employer[Laptop Computer]"
[4] "Radio Stations-Listened to Past Week-Quebec City [FM-CFEL-102.1 (blvd 102.1)]"
所以我认为算法应该这样工作:
在“过去30天”后查找字符串的一部分,并在连字符处停止,
将此提取的字符串复制到主字符串的最后一个字符
从主字符串中的步骤1中删除提取的字符串(但不是您刚刚添加的字符串)。
对于第1步,我昨天提出了一个类似的问题(Ignore part of a string when splitting using regular expression in R)并用它来查找这个正则表达式(?<=Past 30 Days ).+(?![^-])
,它适用于regex101.com但不适用于R(不会)停在连字符处:
reg1 <- regexec(pattern = "(?<=Past 30 Days ).+(?![^-])", vec1, perl=T)
ext1 <- unname(mapply(function(xx,yy) substr(xx, yy, yy+attr(yy,"match.length")), vec1, reg1))
ext1
[1] "[Desktop Computer-Owned by Self]" "[Tablet-Owned by Other HH Member]"
[3] "[Laptop Computer-Made Available by Your Employer]" ""
正如你所看到的,它并不止于连字符。
第二步,我想到的是这样的事情:
vec2 <- unname(mapply(gsub, ext1, vec1, MoreArgs = list(pattern="]")))
vec2
[1] "Internet-Devices Used to Access Internet Past 30 Days [Desktop Computer-Owned by Self[Desktop Computer-Owned by Self]"
[2] "Internet-Devices Used to Access Internet Past 30 Days [Tablet-Owned by Other HH Member[Tablet-Owned by Other HH Member]"
[3] "Internet-Devices Used to Access Internet Past 30 Days [Laptop Computer-Made Available by Your Employer[Laptop Computer-Made Available by Your Employer]"
[4] "Radio Stations-Listened to Past Week-Quebec City [FM-CFEL-102.1 (blvd 102.1)"
除了在向量的最后一个元素中删除“]”并且没有添加正确的字符串(因为问题1)之外,这几乎是我想要的。
最后,我删除了字符串的初始部分:
unname(mapply(gsub, paste0(stringr::str_sub(ext1, end=-2),"["), vec2, MoreArgs = list(replacement="[", fixed=T)))
[1] "Internet-Devices Used to Access Internet Past 30 Days [Desktop Computer-Owned by Self]"
[2] "Internet-Devices Used to Access Internet Past 30 Days [Tablet-Owned by Other HH Member]"
[3] "Internet-Devices Used to Access Internet Past 30 Days [Laptop Computer-Made Available by Your Employer]"
[4] "Radio Stations-Listened to Past Week-Quebec City [FM-CFEL-102.1 (blvd 102.1)"
这种工作,但我遇到与第2步相同的2个问题。
我的整个代码看起来非常沉重和复杂。有没有更好的方法呢?
注意:
答案 0 :(得分:2)
您可以使用
(Past 30 Days\s*)([^-]*)([^]]+)
并替换为\1\3\2
。请参阅regex demo。
<强>详情
(Past 30 Days\s*)
- 第1组(从替换模式引用\1
反向引用):
Past 30 Days
- 文字子字符串\s*
- 0+ whitespaces ([^-]*)
- 第2组:除-
([^]]+)
- 第3组:]
以外的一个或多个字符。vec1 <- c("Internet-Devices Used to Access Internet Past 30 Days [Desktop Computer-Owned by Self]",
"Internet-Devices Used to Access Internet Past 30 Days [Tablet-Owned by Other HH Member]",
"Internet-Devices Used to Access Internet Past 30 Days [Laptop Computer-Made Available by Your Employer]",
"Radio Stations-Listened to Past Week-Quebec City [FM-CFEL-102.1 (blvd 102.1)]")
gsub("(Past 30 Days\\s*)([^-]*)([^]]+)", "\\1\\3\\2", vec1)
# [1] "Internet-Devices Used to Access Internet Past 30 Days -Owned by Self[Desktop Computer]"
# [2] "Internet-Devices Used to Access Internet Past 30 Days -Owned by Other HH Member[Tablet]"
# [3] "Internet-Devices Used to Access Internet Past 30 Days -Made Available by Your Employer[Laptop Computer]"
# [4] "Radio Stations-Listened to Past Week-Quebec City [FM-CFEL-102.1 (blvd 102.1)]"