Question

我有一段文字，我想从中提取每个样本大小。我通常可以让Regex正常工作，但是我无法。

这里是一个例子：

x = "OBJECTIVES:

In diabetic polyneuropathy (DPN) patients, the effect of folic acid and homocysteine has been related to components of nerve conduction velocity (NCV). The objective of this study was to determine the effect of folic acid supplementation on NCV in DPN patients.
METHODS:

Patients were randomized to receive either 1 mg of folic acid (n = 40) or placebo (n = 40) for 16 weeks. Blood samples were collected to assess serum folic acid and homocysteine concentrations, and NCV was performed for assessment of diabetic neuropathy.
RESULTS:

At 16 weeks, in the supplemented group, serum levels of folic acid (p < 0.001) increased, homocysteine concentrations decreased (p < 0.001), with no change in serum vitamin B12 levels. There was a significant increase in sensory sural amplitude (p < 0.001), and components of motor nerves, including amplitude (p = 0.001) and velocity (p < 0.001), but decreased onset latency of peroneal (p = 0.019) and tibial (p = 0.011) motor nerves.
CONCLUSION:

Our data suggest that supplementation with 1 mg of folic acid for 16 weeks may be useful for enhancing NCV in DPN patients."

我想提取两个样本大小。在这种情况下，n = 40，n = 40。

我尝试过

gsub('.*[n=]|).*','',x)

我回来了ts.

Answer 1

这是提取这些值的一种方法

regmatches(x, gregexpr('n\\s*=\\s*\\d+',x))

在这里我们寻找n =（等号周围可能有空格），然后用regmatches提取出来。

Answer 2

没有正则表达式的丑陋解决方案可能是：

#first "n = "
substr(strsplit(x, "n = ",fixed=T)[[1]][2],1,2)
#second "n = "
substr(strsplit(x, "n = ",fixed=T)[[1]][3],1,2)

Answer 3

您可以使用stringr提取“ n =”，后跟至少一位数字。假设等号的两边都没有空格或一个空格：

library(stringr)
str_extract_all(x, "n\\s?\\=\\s?\\d+")
[[1]]
[1] "n = 40" "n = 40"

编辑：在其他条件下，以下内容应在mutate内部起作用。我从stringr切换到stringi，以得到NA的不匹配字符串。另外，您可以使用paste代替stri_flatten，但我会坚持使用stri_flatten，因为它保留了NA作为缺失值，而不是像{这样的字符"NA" {1}}可以。

paste

对于正则表达式，我从R的this cheat sheet开始（并仍然引用它）。上面的正则表达式的工作原理如下：

sapply(stri_extract_all(x, regex = "n\\s?\\=\\s?\\d+"), stri_flatten, collapse = ", ")-字母n

n-最多1（\\s?）个空格（?）（您可能更喜欢MrFlick使用\\s而不是*-您致电）

?-等号

\\=-最多1（\\s?）个空格（?）

\\s-一个或多个（\\d+）位数字（+）

数据：

\\d

Answer 4

检索括号之间数字的方法如下：

library(stringr)

lapply(str_split(x,pattern="\\("),function(x) gsub('(.*)\\).*','\\1',x))
[[1]]
 [1] "OBJECTIVES:\n\nIn diabetic polyneuropathy "
 [2] "DPN"                                       
 [3] "NCV"                                       
 [4] "n = 40"                                    
 [5] "n = 40"                                    
 [6] "p < 0.001"                                 
 [7] "p < 0.001"                                 
 [8] "p < 0.001"                                 
 [9] "p = 0.001"                                 
[10] "p < 0.001"                                 
[11] "p = 0.019"                                 
[12] "p = 0.011"

您可以使用\（作为模式来分割文本，并将gsub应用于每一部分。然后，您可以使用grep识别哪些元素以“ n =“开头，以检索所需的元素。

我希望对您有帮助

从R中两个符号之间的段落中提取文本

4 个答案: