Question

我有一个字符串id <- "Hello these are words N12345678 hooray how fun"。

我想从这个字符串中提取N12345678。

到目前为止，我使用了strsplit(id, " ")。现在我有了

>id
>[[1]]
>[1] "Hello" "these" "are" "words" "N12345678" "hooray" "how"
>[8] "fun"

哪种类型列表和长度为1（尽管显然有8个元素？）

如果我然后使用id <- id[grep("^[N][0-9]",id)]， id是一个空列表。

我认为我需要做的是将字符串拆分为长度为8的列表，每个元素作为子字符串，然后grep应该能够选择模式，但是我不确定如何去做这一点。

Answer 1

使用regmatches

> regmatches(id, regexpr("N[0-9]+", id))
[1] "N12345678"

Answer 2

你知道strtok吗？它将解析某些字符的输入行。出于我的例子的目的，每次我进入一个空间时，我都会断掉一根绳子。

tempVar = strtok(string, " ");
// tempVar has "id" or everything up to the first space
while (tempVar != NULL)
{
     tempVar = strtok(NULL, " ");
     //now tempVar picked up the next word, and will loop picking up the next word until the end of string
}

使用这个，你的“你好这些是字N123456789 Hooray”会这样做： tempVar将是Hello，然后是“这些”等等。

每次通过循环tempVar都会获得一个新值。所以我建议在循环中评估tempVar（在抓住下一个之前），这样你就可以在你有N123456789时停止

Answer 3

尝试：

gsub('\\b[a-zA-Z]+\\b','',id)

Answer 4

如果您坚持使用strsplit。我认为这可以解决问题：

id <- "Hello these are words N12345678 hooray how fun"
id = strsplit(id, " ")
id[[1]][grep("^N[1-9]", id[[1]])]

请注意，我还没有改变你的正则表达式。它可以是更精确的表达，例如^N\\d+$。

将字符串拆分为子字符串列表

4 个答案: