我有一个数据框,df
看起来像这样:
date sample
1 29-Apr 1,000 (1/4)
2 29-Apr 1,000 (1/4)
3 28-Apr 1,970
4 27-Apr 1,000 (1/4)
5 25-Apr 1,000 (1/4)
...
如何在括号中提取值并从中创建新列?
我可以在括号中extract the values:
matches <- regexpr("\\(.*?\\)", df$Sample_Size)
fractions_with_parens <- regmatches(df$Sample_Size, matches)
fractions <- gsub("[\\(\\)]", "", more)
但是这将删除不匹配,因此向量确实匹配数据帧行的长度。所以在这个例子中,第3行将会丢失。
答案 0 :(得分:3)
您可以尝试stringr
:
library(stringr)
df$extract <- str_extract(df$sample, "\\(.*?\\)")
df
# date sample extract
#1 29-Apr 1,000 (1/4) (1/4)
#2 29-Apr 1,000 (1/4) (1/4)
#3 28-Apr 1,970 <NA>
#4 27-Apr 1,000 (1/4) (1/4)
#5 25-Apr 1,000 (1/4) (1/4)
要在括号内提取值,您可以这样做:
df$extract <- str_extract(df$sample, "(?<=\\().*(?=\\))")
感谢epi99的建议。
答案 1 :(得分:2)
您可以使用dplyr
:
library(stringr)
library(dplyr)
df <- data.frame(date = c('29-Apr', '29-Apr', '28-Apr', '27-Apr', '25-Apr'),
sample = c('1,000 (1/4)', '1,000 (1/4)', '1,970',
'1,000 (1/4)', '1,000 (1/4)'))
df %>% mutate(new = str_match(sample, pattern = '\\d+/\\d+'))
导致:
date sample new
1 29-Apr 1,000 (1/4) 1/4
2 29-Apr 1,000 (1/4) 1/4
3 28-Apr 1,970 <NA>
4 27-Apr 1,000 (1/4) 1/4
5 25-Apr 1,000 (1/4) 1/4
答案 2 :(得分:1)
我们可以使用qdapRegex
library(qdapRegex)
df$new <-unlist(ex_round(df$sample, include.markers=TRUE))
df$new
#[1] "(1/4)" "(1/4)" NA "(1/4)" "(1/4)"
如果我们不需要括号,请删除include.markers
df$new <-unlist(ex_round(df$sample))
df$new
#[1] "1/4" "1/4" NA "1/4" "1/4"