Question

如何从下面的字符串中提取单词wordofvariablelength。

<a href=\"http://www.adrive.com/browse/wordofvariablelength\" class=\"next-button\" id=\"explore-gutter\" data-linkid=\"huiazc\"> <strong class=\"text gutter-text \">

我能够使用下面的代码获取字符串的第一部分，但是有一个正则表达式我可以用来在“browse /”之后和“\”之前立即获取单词，这里是单词，“wordofvariablelength”使用下面的代码

mystring = substr(mystring,nchar("<a href=\"http://www.thesaurus.com/browse/")+1,nchar("<a href=\"http://www.thesaurus.com/browse/")+20)

请注意，wordofvariablelength这个词可以是任意长度，所以我不能硬编码并开始和结束

Answer 1

尝试

sub('.*?\\.com/[^/]*\\/([a-z]+).*', '\\1', mystring)
#[1] "wordofvariablelength"

或者

library(stringr)
 str_extract(mystring, perl('(?<=browse/)[A-Za-z]+'))
#[1] "wordofvariablelength"

数据

mystring <- "<a href=\"http://www.adrive.com/browse/wordofvariablelength\" class=\"next-button\" id=\"explore-gutter\" data-linkid=\"huiazc\"> <strong class=\"text gutter-text \">"

Answer 2

通过regmatches功能。

> x <- "<a href=\"http://www.adrive.com/browse/wordofvariablelength\" class=\"next-button\" id=\"explore-gutter\" data-linkid=\"huiazc\"> <strong class=\"text gutter-text \">"
> regmatches(x, regexpr('.*?"[^"]*/\\K[^/"]*(?=")', x, perl=TRUE))
[1] "wordofvariablelength"

或

> regmatches(x, regexpr('[^/"]*(?="\\s+class=")', x, perl=TRUE)) [1] "wordofvariablelength"

或

使用gsub更加简单。

> gsub('.*/|".*', "", x) [1] "wordofvariablelength"

Answer 3

你可以使用这个正则表达式

/browse\/(.*?)\\/g

演示https://regex101.com/r/gX4dC0/1

Answer 4

您可以使用以下正则表达式(?<=browse/).*?(?=\\")。正则表达式意味着：检查我们是否有browse/，然后将所有后续字符最多（但不消耗）\。

示例代码（和sample program here）：

mystr <- "<a href=\"http://www.adrive.com/browse/wordofvariablelength\" class=\"next-button\" id=\"explore-gutter\" data-linkid=\"huiazc\"> <strong class=\"text gutter-text \">"
regmatches(mystr, regexpr('(?<=browse/).*?(?=\\")', mystr, perl=T))

perl=T表示我们正在使用 Perl 一样的正则表达式风格，允许使用固定宽度的后视（(?<=browse/)）。

输出：

[1] "wordofvariablelength"

R中的正则表达式，表示两个字符之间可变长度的单词

4 个答案:

数据