如何在R中提取字符串的一部分

时间:2014-02-25 08:39:02

标签: r

我有一个字符串“已经为我的第一个安卓智能手机搜索了超过6个月。预算为10K的所有设置都要购买MMX 110但是那么想要等待MMX 116.但MMX 116在16K,我太有必要从15K增加我的预算,并开始寻找SAMSUNG,IPHONE,HTC等的其他智能手机选项。“

我想只提取那些包含单词“SMARTPHONE”的部分。这些部分必须位于两个句号或两个逗号之内。逗号和句号的组合也可以。

我尝试了R代码

y = grep(“[,。]?[[:alnum:]] +(SMARTPHONE)[[:alnum:]] + [,。]”,x,perl = TRUE,value = TRUE)

但它没有给我预期的结果。

2 个答案:

答案 0 :(得分:1)

您不想拆分所有逗号并完全停止,然后查看每个元素是否包含SMARTPHONE?

有关拆分字符串的信息,请参阅here。然后here进行部分字符串匹配。

要注意的一件事是分裂'时期',这将在缩写之后分裂,例如先生

答案 1 :(得分:1)

这就是你想要的吗?

> a <- "HAD BEEN SEARCHING FOR MY 1ST ANDROID SMARTPHONE FOR OVER 6 MONTHS. WITH BUDGET OF 10K WAS ALL SET TO BUY MMX 110 BUT THEN THOUGHT TO WAIT FOR MMX 116. BUT WITH MMX 116 AT 16K, I TOO HAD TO INCREASE MY BUDGET FROM 15K AND STARTED TO LOOK OUT FOR OTHER SMARTPHONE OPTIONS FROM SAMSUNG, IPHONE, HTC ETC."
> b <- unlist(strsplit(a, "[,.]"))
> (c <- b[grep("SMARTPHONE", b)])
[1] "HAD BEEN SEARCHING FOR MY 1ST ANDROID SMARTPHONE FOR OVER 6 MONTHS"                                         
[2] " I TOO HAD TO INCREASE MY BUDGET FROM 15K AND STARTED TO LOOK OUT FOR OTHER SMARTPHONE OPTIONS FROM SAMSUNG"