我有大量的数据集,每个时间间隔观察到不同数量的物种的丰度。数据集跨越了几年,我想计算每个物种的月平均/季度丰度。
输入矩阵如下所示:
> start end G_rub G_sac P_obl N_dut G_glu G_bul G_men
1. 17/05/2004 13/06/2004 22 140 0 9 7 0 2
2. 14/06/2004 11/07/2004 453 53 11 124 356 57 13
3. 12/07/2004 08/08/2004 406 114 15 145 158 44 2
我希望得到一个看起来像这样的矩阵:
>month G_rub G_sac P_obl N_dut G_glu G_bul G_men
jan
feb
mar
etc...
我是R的新手,但我的解决方案是尝试这些方面的事情:
1)创建一个矩阵,其中包含每个观察间隔的每月天数
2)乘以这些间隔的每个物种的丰度
3)将这些矩阵的列的总和除以整个观察期的每月总天数
4)将这些向量组合成新的矩阵,看起来像上面的
我刚刚学会了如何进行第一步,但却遇到了在物种列表中循环的问题。
非常感谢任何有关如何执行此操作或使用不同方法的帮助。
答案 0 :(得分:0)
我会按如下方式处理:
calculateForOnePeriod <- function(DT, date.start, date.end, period.name, frmt="%d/%m/%Y", DateCols, SpeciesCols) {
date.start <- as.Date(as.character(date.start), format=frmt)
date.end <- as.Date(as.character(date.end), format=frmt)
# find the relevant rows, by date. Namely starting from the largest (start <= start.date) and ending with the smallest (end >= end.date)
row.index.min <- DT[, max(which(start <= date.start), -1)]
row.index.max <- DT[, min(which(end >= date.end), -1)]
# the `-1` are to indicate out of range
# if both are negative one, date not present at all
# otherwise, if just one of the two are -1, match to the valid value (ie, single row range)
if (row.index.max == -1 && row.index.min == -1) {
return(DT[, c(period=period.name, lapply(.SD, function(x) 0)), .SDcols=SpeciesCols])
} else if (row.index.max == - 1) {
row.index.max <- row.index.min
} else if (row.index.min == - 1) {
row.index.min <- row.index.max
}
DT2 <- DT[row.index.min : row.index.max,
# calculate the weighted averages
{
# n.days are the intersects
n.days <- length(intersect(seq.Date(start, end, by=1), seq.Date(date.start, date.end, by=1)))
lapply(.SD, `*`, n.days)
}
, by=DateCols
, .SDcols=SpeciesCols
]
DT2[, c(period=period.name, lapply(.SD, function(x) sum(x, na.rm=TRUE) / as.numeric(1+date.end-date.start))), .SDcols=SpeciesCols]
}
library(data.table)
# convert to data.table
DT <- data.table(dat)
# grab all of the species columns. Modify this accordingly to your real data
DateCols <- c("start", "end")
SpeciesCols <- setdiff(names(DT), DateCols)
# Make sure your dates are in fact dates (and not, say, just strings or factors)
DT[, start := as.Date(as.character(start), format="%d/%m/%Y")]
DT[, end := as.Date(as.character(end), format="%d/%m/%Y")]
# ensure that data is sorted by start, end
setkeyv(DT, DateCols)
只需创建一个开始/结束日期的向量并迭代 简单的例子:
first.date <- as.Date("01/01/2004", "%d/%m/%Y")
interv <- "month" # needs to be a valid value of `by=` in ?seq.Date
total.periods <- 12 # how many periods to analyze
starting.dates <- seq.Date(from=first.date, by="month", length.out=total.periods+1) # +1 for ending dates
ending.dates <- starting.dates - 1
starting.dates <- head(starting.dates, -1)
ending.dates <- tail(ending.dates, -1)
# sample period.names.. this will need to be modified
period.names <- month.abb[month(starting.dates)]
# Note that format is now "2004-06-01"
frmt.exmp <- "%Y-%m-%d"
## have a look:
data.frame(starting.dates, ending.dates)
# iterate using mapply
res.list <-
mapply(calculateForOnePeriod, date.start=starting.dates, date.end=ending.dates, period.name=period.names
, MoreArgs=list(DT=DT, frmt=frmt.exmp, DateCols=DateCols, SpeciesCols=SpeciesCols), SIMPLIFY=FALSE)
# combine into a single data.table
res <- rbindlist(res.list)
# optionally clean 0's to NA
ZeroRows <- apply(res[, !"period", with=FALSE]==0, 1, all)
res[ZeroRows, c(SpeciesCols) := NA]
res
period G_rub G_sac P_obl N_dut G_glu G_bul G_men
1: Jan NA NA NA NA NA NA NA
2: Feb NA NA NA NA NA NA NA
3: Mar NA NA NA NA NA NA NA
4: Apr NA NA NA NA NA NA NA
5: May NA NA NA NA NA NA NA
6: Jun 9.533333 60.66667 0.000000 3.90000 3.033333 0.00000 0.8666667
7: Jul 160.741935 18.80645 3.903226 44.00000 126.322581 20.22581 4.6129032
8: Aug 104.774194 29.41935 3.870968 37.41935 40.774194 11.35484 0.5161290
9: Sep NA NA NA NA NA NA NA
10: Oct NA NA NA NA NA NA NA
11: Nov NA NA NA NA NA NA NA
12: Dec NA NA NA NA NA NA NA
答案 1 :(得分:0)
我花了一段时间(仍然试图发现R),但我认为这很有效。希望这对某人有用。
# get species
species <- subset(data, select = -c(open, close))
# get open close dates
open <- as.Date(data$open, "%d/%m/%Y")
close <- as.Date(data$close, "%d/%m/%Y")
# calculate number of days per month
days <- mapply(function(x,y)
{
vv <- vector('integer',12)
names(vv) <- c(paste0('0',1:9),10:12)
ff <- table(format(seq(x,y,1),'%m'))
vv[names(ff)] <- ff
vv
},
open,close)
days <- t(days)
# mean flux for months
monthdays <- colSums (days)
sp_days <- lapply(species, '*', days)
sp_month <- lapply(sp_days, 'colSums',na.rm = T)
sum_month_flux <- lapply(sp_month,'/',monthdays)
month_flux <- do.call(cbind,sum_month_flux)
> month_flux
G_rub G_sac P_obl N_dut G_glu G_bul G_men
01 NaN NaN NaN NaN NaN NaN NaN
02 NaN NaN NaN NaN NaN NaN NaN
03 NaN NaN NaN NaN NaN NaN NaN
04 NaN NaN NaN NaN NaN NaN NaN
05 22.0000 140.00000 0.000000 9.00000 7.0000 0.0000 2.000000
06 266.2333 90.70000 6.233333 74.16667 204.7667 32.3000 8.233333
07 422.6774 92.35484 13.580645 137.54839 228.2581 48.6129 5.903226
08 406.0000 114.00000 15.000000 145.00000 158.0000 44.0000 2.000000
09 NaN NaN NaN NaN NaN NaN NaN
10 NaN NaN NaN NaN NaN NaN NaN
11 NaN NaN NaN NaN NaN NaN NaN
12 NaN NaN NaN NaN NaN NaN NaN