我的数据示例如下:
site<-c("A","B","C","D")
year1<-c(1990,1990,1990,1990)
year2<-c("",1991,1991,1991)
year3<-c(1992,1992,1992,1992)
year4<-c(1993,"",1993,"")
year5<-c(1994,1994,1994,1994)
dat<-data.frame(site,year1,year2,year3,year4,year5)
我想计算每一行(或本例中的网站)的数据范围,但我想包括存在缺失值的中断。
所以创建一个类似于此类的列。
dat$year_range<-c("1990, 1992-1994","1990-1992, 1994","1990-1994","1990-1992, 1994")
感谢。
答案 0 :(得分:2)
这里有一些正则表达式(从里到外阅读/尝试):
gsub(',+', ',', # final cleanup of multiple commas
gsub('(^,+|,+$)', '', # cleanup of commas at end of start
# the meat - take out adjacent years and replace them with a '-'
gsub('((?<=,,)|^)([0-9]+),([0-9]+,)+([0-9]+)((?=,,)|$)',
',\\2-\\4,',
apply(dat[, -1], 1, paste, collapse = ","), perl = TRUE)))
#[1] "1990,1992-1994" "1990-1992,1994" "1990-1994" "1990-1992,1994"
答案 1 :(得分:2)
这是一个提案,我想它可以用更简单的方式完成:
dat$year_range <- apply(dat[-1], 1, function(x) {
x <- as.integer(x)
paste(tapply(x[!is.na(x)], cumsum(is.na(x))[!is.na(x)], function(y)
paste(unique(range(y)), collapse = "-")), collapse = ", ")
})
# site year1 year2 year3 year4 year5 year_range
# 1 A 1990 1992 1993 1994 1990, 1992-1994
# 2 B 1990 1991 1992 1994 1990-1992, 1994
# 3 C 1990 1991 1992 1993 1994 1990-1994
# 4 D 1990 1991 1992 1994 1990-1992, 1994