在R中通过分隔符拆分字符串

时间:2017-02-18 20:49:59

标签: r string split

我有以下字符串

x <- "b|all|the|experts|admit|that|we|should|legalise|drugs|b|war|in|south|osetia|pictures|made|by|a|russian|soldier|b|swedish|wrestler|ara|abrahamian|throws|away|medal|in|olympic|hissy|fit|b|russia|exaggerated|the|death|toll|in|south|ossetia|now|only|were|originally|killed|compared|to|b|missile|that|killed|inside|pakistan|may|have|been|launched|by|the|cia|b|rushdie|condemns|random|house|s|refusal|to|publish|novel|for|fear|of|muslim|retaliation|b|poland|and|us|agree|to|missle|defense|deal|interesting|timing|b|will|the|russians|conquer|tblisi|bet|on|it|no|seriously|you|can|bet|on|it|b|russia|exaggerating|south|ossetian|death|toll|says|human|rights|group|b|musharraf|expected|to|resign|rather|than|face|impeachment|b|moscow|made|plans|months|ago|to|invade|georgia|b|why|russias|response|to|georgia|was|right|b|nigeria|has|handed|over|the|potentially|oil|rich|bakassi|peninsula|to|cameroon|b|the|us|and|poland|have|agreed|a|preliminary|deal|on|plans|for|the|controversial|us|defence|shield"

当我尝试使用

拆分时
> strsplit(x,"|")
[[1]]
  [1] "b" "|" "a" "l" "l" "|" "t" "h" "e" "|" "e" "x" "p" "e" "r" "t" "s" "|" "a" "d" "m" "i" "t" "|" "t" "h" "a" "t" "|"
 [30] "w" "e" "|" "s" "h" "o" "u" "l" "d" "|" "l" "e" "g" "a" "l" "i" "s" "e" "|" "d" "r" "u" "g" "s" "|" "b" "|" "w" "a"
 [59] "r" "|" "i" "n" "|" "s" "o" "u" "t" "h" "|" "o" "s" "e" "t" "i" "a" "|" "p" "i" "c" "t" "u" "r" "e" "s" "|" "m" "a"
 [88] "d" "e" "|" "b" "y" "|" "a" "|" "r" "u" "s" "s" "i" "a" "n" "|" "s" "o" "l" "d" "i" "e" "r" "|" "b" "|" "s" "w" "e"
[117] "d" "i" "s" "h" "|" "w" "r" "e" "s" "t" "l" "e" "r" "|" "a" "r" "a" "|" "a" "b" "r" "a" "h" "a" "m" "i" "a" "n" "|"
[146] "t" "h" "r" "o" "w" "s" "|" "a" "w" "a" "y" "|" "m" "e" "d" "a" "l" "|" "i" "n" "|" "o" "l" "y" "m" "p" "i" "c" "|"
[175] "h" "i" "s" "s" "y" "|" "f" "i" "t" "|" "b" "|" "r" "u" "s" "s" "i" 
.........

但是我想要分隔符|分隔的单词。我哪里错了?

1 个答案:

答案 0 :(得分:4)

您使用的这个字符在正则表达式中具有特殊含义 - 它表示OR。所以你的分裂模式是这样的:

空字符串或空字符串==空字符串

这就是你输入字符串被char分割的原因。 要将此作为普通字符使用而不使用特殊的正则表达式意味着您必须将其转义,如下所示:

strsplit(x, "\\|")