Question

我想在R中使用来自stringr的str_view来查找以＆＃34; y＆＃34;开头的所有单词。以＆＃34; x结尾的所有单词。＆＃34;我有一个由Corpora生成的单词列表，但每当我启动代码时，它都会返回一个空白视图。

Common_words<-corpora("words/common")

#start with y
start_with_y <- str_view(Common_words, "^[y]", match = TRUE)
start_with_y

#finish with x
str_view(Common_words, "$[x]", match = TRUE)

另外，我想找到只有3个字母的单词，但不是到目前为止的想法。

Answer 1

我会说这不是用 stringr 编程，而是学习一些正则表达式。以下是一些我认为对学习有用的网站：

此处，\\w或单词字符的简写类（即[A-Za-z0-9_]）对于量词（+和{3}在这两种情况下）非常有用。 PS在这里我使用 stringi ，因为 stringr 无论如何都在后端使用它。只是跳过中间人。

x <- c("I like yax because the rock to the max!", 
    "I yonx & yix to pick up stix.")

library(stringi)

stri_extract_all_regex(x, 'y\\w+x')
stri_extract_all_regex(x, '\\b\\w{3}\\b')

## > stri_extract_all_regex(x, 'y\\w+x')
## [[1]]
## [1] "yax"
## 
## [[2]]
## [1] "yonx" "yix" 


## > stri_extract_all_regex(x, '\\b\\w{3}\\b')
## [[1]]
## [1] "yax" "the" "the" "max"
## 
## [[2]]
## [1] "yix"

编辑似乎这些也可能有用：

## Just y starting words
stri_extract_all_regex(x, 'y\\w+\\b')

## Just x ending words
stri_extract_all_regex(x, 'y\\w+x')

## Words with n or more characters
stri_extract_all_regex(x, '\\b\\w{4,}\\b')

将str_view与R中的单词列表一起使用

1 个答案: