如何拆分混合大小写的字符串?

时间:2019-04-13 19:22:25

标签: r regex perl regex-lookarounds regex-greedy

我有一组以下形式的字符串:

Team XYZJohn SMITH / Jane SMITH
TEAM RacersJim SMITH / Jane SMITH
John McMahon RacingBob SMITH / Jane SMITH

,并希望拆分连接的名称以提供类似以下的字符串:

Team XYZ :: John SMITH / Jane SMITH
TEAM Racers :: Jim SMITH / Jane SMITH
John McMahon Racing :: Bob SMITH / Jane SMITH

我使用Perl在R中,但这是我要使用的正则表达式。

通过https://stackoverflow.com/a/43706490/454773,这适用于TEAM RacersJohn SMITH / Jane SMITH

paste(strsplit('TEAM RacersJohn SMITH / Jane SMITH', "(?<=[a-z])(?=[A-Z])", perl = TRUE)[[1]], collapse=' :: ')

但显然在McMahon中给出了不必要的拆分,而错过了Team XYZJohn中的拆分。

对于诸如McMahon之类的事情,我在想一种不要在[A-Z][a-z]{1,2}[A-Z] 上进行启发的做法,这也可以应对MacDonald

测试:

#Team XYZ :: John SMITH / Jane SMITH
#TEAM Racers :: John SMITH / Jane SMITH
#John McMahon Racing :: John SMITH / Jane SMITH
regex="(?<![A-Z][a-z])(?=[A-Z][a-z])"
print(paste(strsplit('Team XYZJohn SMITH / Jane SMITH', regex, perl = TRUE)[[1]], collapse=' :: '))
print(paste(strsplit('TEAM RacerJim SMITH / Jane SMITH', regex, perl = TRUE)[[1]], collapse=' :: '))
print(paste(strsplit('John McMahon RacingBob SMITH / Jane SMITH', regex, perl = TRUE)[[1]], collapse=' :: '))

通过Twitter上的@graemefowler,我们拥有:s/^(.+[A-Z][a-zA-Z]+)([A-Z]\w+ [A-Z]+ \/.+)/$1 :: $2/;

print(gsub("^(.+[A-Z][a-zA-Z]+)([A-Z]\\w+ [A-Z]+ \\/.+)", "\\1 :: \\2", "TEAM RacersJohn SMITH / Jane SMITH", perl=TRUE))
print(gsub("^(.+[A-Z][a-zA-Z]+)([A-Z]\\w+ [A-Z]+ \\/.+)", "\\1 :: \\2", "Team XYZJohn SMITH / Jane SMITH", perl=TRUE))
print(gsub("^(.+[A-Z][a-zA-Z]+)([A-Z]\\w+ [A-Z]+ \\/.+)", "\\1 :: \\2", "John McMahon RacingJohn SMITH / Jane SMITH", perl=TRUE))


[1] "TEAM Racers :: John SMITH / Jane SMITH"
[1] "Team XYZ :: John SMITH / Jane SMITH"
[1] "John McMahon Racing :: John SMITH / Jane SMITH"

1 个答案:

答案 0 :(得分:1)

This RegEx可能会帮助您获得一个目标组 space + SMITH + space

 \s[A-Z]+\s\/

输出

enter image description here

This RegEx可能会帮助您获得两个目标组,并且可以使用字符串替换在组1之前放置空格空格 + :: 在第二组之前:

enter image description here