我有一个字符向量x
和一个data.frame
y
,如下所示。
x <- c("Pumpkin Helmet", "Warm Puppy", "Frisbee Sailing",
"Warm Puppy Frisbee Sailing", "Good Sport", "Masked Marvel",
"Spring Dance", "Spring Warm Dance Puppy", "Sock it to Me",
"Maskedspring Dancemarvel", "warm Puppy", "masked marvel",
"WARM PUPPY", " Spring Dance", "Warm Puppy Spring Dance",
"Warmspring Dancepuppy")
x
[1] "Pumpkin Helmet" "Warm Puppy" "Frisbee Sailing"
[4] "Warm Puppy Frisbee Sailing" "Good Sport" "Masked Marvel"
[7] "Spring Dance" "Spring Warm Dance Puppy" "Sock it to Me"
[10] "Maskedspring Dancemarvel" "warm Puppy" "masked marvel"
[13] "WARM PUPPY" " Spring Dance" "Warm Puppy Spring Dance"
[16] "Warmspring Dancepuppy"
a <- c("Masked", "Warm", "spring")
b <- c("Marvel", "Puppy", "dance")
y <- data.frame(a,b)
y
a b
1 Masked Marvel
2 Warm Puppy
3 spring dance
我正在尝试使用regex
创建一个函数,以合并x
中存在的行中的单词。
在尝试使用regex
和apply
x
之前,我尝试了以下内容来获得所需的y
。
gsub("Spring(\\s+)Dance.*", "SpringDance", x)
gsub("spring(\\s+)Dance.*", "SpringDance", x)
gsub("Warm(\\s+)Puppy.*", "WarmPuppy", x)
我仍然在regex
中使用R
进行唠叨,以获得所需的输出out
。在这种情况下,理想的regex
是什么?它应该只匹配整个单词,应该忽略大小写并删除其间的多个空格。
out <- c("Pumpkin Helmet", "WarmPuppy", "Frisbee Sailing",
"WarmPuppy Frisbee Sailing", "Good Sport", "MaskedMarvel",
"SpringDance", "Spring Warm Dance Puppy", "Sock it to Me",
"Maskedspring Dancemarvel", "warmPuppy", "maskedmarvel",
"WARMPUPPY", " SpringDance", "WarmPuppy SpringDance",
"Warmspring Dancepuppy")
[1] "Pumpkin Helmet" "WarmPuppy" "Frisbee Sailing"
[4] "WarmPuppy Frisbee Sailing" "Good Sport" "MaskedMarvel"
[7] "SpringDance" "Spring Warm Dance Puppy" "Sock it to Me"
[10] "Maskedspring Dancemarvel" "warmPuppy" "maskedmarvel"
[13] "WARMPUPPY" " SpringDance" "WarmPuppy SpringDance"
[16] "Warmspring Dancepuppy"
答案 0 :(得分:4)
好像你想要这样的东西,
> gsub("(?i)(?<=^Spring|^warm|^masked)\\s+(?=Dance|puppy|marvel)\\b|\\b(?<=Spring|warm|masked)\\s+(?=Dance$|puppy$|marvel$)", "", x, perl=T)
[1] "Pumpkin Helmet" "WarmPuppy" "Frisbee Sailing"
[4] "WarmPuppy Frisbee Sailing" "Good Sport" "MaskedMarvel"
[7] "SpringDance" "Spring Warm Dance Puppy" "Sock it to Me"
[10] "Maskedspring Dancemarvel" "warmPuppy" "maskedmarvel"
[13] "WARMPUPPY" " SpringDance" "WarmPuppy SpringDance"
[16] "Warmspring Dancepuppy"
<强>解释强>
(?i)
不区分大小写的修饰符有助于打开不区分大小写的模式。(?<=^Spring|^warm|^masked)
查看开头的字符串spring
或warm
或masked
。\\s+
,如果是,则匹配以下一个或多个空格。(?=Dance|puppy|marvel)\\b
并检查空格是否后跟Dance
或 - puppy
或marvel
。如果是,则保留匹配,否则松开那些匹配的空格。|
逻辑OR运算符。\b
在单词字符和非单词字符之间匹配的单词边界。$
行锚点结束。