我有一个文本字符串,我想根据字符|
(垂直条/管道,即带有ascii代码124的字符)进行拆分。
尝试这样做时,我的字符串会在每个字符上分开。也就是说,以下代码
string <- "Hello | Good bye!"
split <- strsplit(string, "|")
print(split[[1]])
生成此输出
[1] "H" "e" "l" "l" "o" " " "|" " " "G" "o" "o" "d" " " "b" "y" "e" "!"
如果我只是将|
符号更改为/
(或任何其他字符),它会按预期工作。也就是说,以下代码
string <- "Hello / Good bye!"
split <- strsplit(string, "/")
print(split[[1]])
生成此输出
[1] "Hello " " Good bye!"
这就是我想要的。
答案 0 :(得分:4)
您需要使用fixed = TRUE
来解释元字符,例如|
字面上:
string <- "Hello | Good bye!"
strsplit(string, "|", fixed = TRUE)
#[[1]]
#[1] "Hello " " Good bye!"
同样,
strsplit("Hello . Good bye!", ".")[[1]]
#[1] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
strsplit("Hello . Good bye!", ".", fixed = TRUE)[[1]]
#[1] "Hello " " Good bye!"
或者,您可以使用双反斜杠手动转义此类字符
strsplit("Hello | Good bye!", "\\|")[[1]]
#[1] "Hello " " Good bye!"
或用\\Q...\\E
包装它们,这将转义所有非字母数字字符:
strsplit("Hello | Good bye!", "\\Q|\\E")[[1]]
#[1] "Hello " " Good bye!"