Question

我想从文本中提取年份。

以下代码为我提供了一个值为1998和2009的向量

description= "I was teaching at the univeristy from 1998 to 2009"
teaching = as.numeric(str_extract_all(description ,"\\d{4}")[[1]])

那我想减去几年

teaching[2] - teaching[1] 
[1] 11

但是问题是我在数据框中有一个包含这些文本的列，我想从每个文本中提取年份并减去它们。

我尝试这样做，但感到困惑

аа = lapply(df$description, str_extract_all,"\\d{4}")
bb = lapply(aa, function(x) x[1])

Answer 1

您可以尝试以下方法：

# example data

df <- data.frame(description = paste("I was teaching at the univeristy from",1990:1995, "to",seq(2010,2020,by =2)))

#  description
#1 I was teaching at the univeristy from 1990 to 2010
#2 I was teaching at the univeristy from 1991 to 2012
#3 I was teaching at the univeristy from 1992 to 2014
#4 I was teaching at the univeristy from 1993 to 2016
#5 I was teaching at the univeristy from 1994 to 2018
#6 I was teaching at the univeristy from 1995 to 2020

years <- str_extract_all(df$description, "\\d{4}")
sapply(years, function(x) diff(as.numeric(x)))
# 20 21 22 23 24 25

处理NA的替代方法：

# example data 
df <- data.frame(description = c(paste("I was teaching at the univeristy from",1990:1995, "to",seq(2010,2020,by =2)), "I was not teaching at all"))

years <- str_extract_all(df$description, "\\d{4}", simplify = TRUE)
apply(years, 1, function(x) diff(as.numeric(x)))
# 20 21 22 23 24 25 NA

减去子列表并使用str_extract函数时如何使用lapply

1 个答案: