Question

我想用dplyr ::分隔减号（-），该减号必须在空格之后和大写字母之前。

我的正则表达式[\s]-[A-Z]包含空格和大写字母，因此通过分隔删除。我只想在该特定位置使用减号进行分隔，而不要删除空格和下一个字母。

library(dplyr)

data.frame(x = c("Hans-Peter Wurst -My Gosh", "What is -wrong here -Do not worry")) %>% 
  separate(x, into = c("one", "two"), sep = "[\\s]-[A-Z]")

结果：

#                   one         two
# 1    Hans-Peter Wurst      y Gosh
# 2 What is -wrong here o not worry

所需的输出为：

#                   one          two
# 1    Hans-Peter Wurst      My Gosh
# 2 What is -wrong here Do not worry

Answer 1

您可以将大写字母模式包装在后向/向前看中

sep = "(?<!\\S)-(?=[A-Z])"

或者，如果必须排除字符串开头的-，请使用

sep = "(?<=\\s)-(?=[A-Z])"

请参见regex demo

由于lookarounds是不占用文本的零宽度断言（它们匹配的文本不落在总体匹配值之内，因此它仅检查模式是否匹配并返回true或false），因此字母将为保留在输出中。

详细信息

(?<=\s)-向后的正向查找，要求在当前位置的左侧紧挨一个空白
(?<!\S)-向后隐藏，要求在当前位置的左侧立即开始字符串位置或空格
--连字符
(?=[A-Z])-正向超前，要求在当前位置的右侧紧跟一个大写ASCII字母。

Answer 2

我们可以使用extract来捕获字符（(..)）。将不需要的字符放在括号之外

library(tidyverse)
data.frame(x = c("Hans-Peter Wurst -My Gosh", 
               "What is -wrong here -Do not worry")) %>%
     extract(x, into = c("one", "two"), "(.*) -([^-]+)$")
#                 one          two
#1    Hans-Peter Wurst      My Gosh
#2 What is -wrong here Do not worry

在特定字符（空格后和大写字母前）之间的键之间分隔（dplyr）

2 个答案: