我正在研究面板数据集。我有20个网站'生产数据超过10年。我想估计不同降雨模式(RF)对月产量的影响。
我的数据存储在Google中,如下所示:
我希望得到季节性降雨模式对月产量的影响。我的降雨季节如下:
我需要在2000年到2010年的10年间跨越横断面(n = 20)获得这四种模式年明智的总和。我没有1999年12月份的RF数据,在这种情况下,我们可以假设1999年12月RF与2000年1月相同(另有建议将受到赞赏)。
到目前为止,我已对此进行了编码:
dat<-read.csv("my_data.csv")
# get rainfall (RF) and other data
RF <- dat$RF
Y <- dat$Year
Mon <- dat$Mon
Site <- dat$Site
#Specify new data frame with 4 seasons of RF over the years across different sites
Year <- vector(mode="numeric",length = ((Y[length(dat$Y)]-Y[1])+1)*(length(levels(Site))))
Site <- vector(mode="numeric",length = ((Y[length(dat$Y)]-Y[1])+1)*(length(levels(Site))))
Season1 <- vector(mode="numeric",length = ((Y[length(dat$Y)]-Y[1])+1)*(length(levels(Site))))
Season2 <- vector(mode="numeric",length = ((Y[length(dat$Y)]-Y[1])+1)*(length(levels(Site))))
Season3 <- vector(mode="numeric",length = ((Y[length(dat$Y)]-Y[1])+1)*(length(levels(Site))))
Season4 <- vector(mode="numeric",length = ((Y[length(dat$Y)]-Y[1])+1)*(length(levels(Site))))
Year <- rep(seq(from = Y[1],to=Y[length(Y)]),length(levels(Site)))
number_of_Y <-Y[length(Y)]-Y[1]+1
#Site_index <- 2
for (Site_index in 1 : length(levels(Site))){
start_row <- 1+(Site_index-1)*number_of_Y
end_row <- (Site_index-1)*number_of_Y + number_of_Y
Site[start_row:end_row] <- rep(levels(Site)[Site_index],(Y[length(Y)]-Y[1]+1))
}
但它不起作用。我不明白为什么&#34; Site&#34;没有得到上述代码的级别,以及如何将每个RF模式的总数作为一个新的数据框架每年在网站上获得。
答案 0 :(得分:0)
首先,在google中打开您的文件,然后将该文件导出为.csv文件。它将落在您的下载文件夹中。阅读:
dat <- read.csv(file = "~/Downloads/my_data - my_data.csv",
stringsAsFactors = FALSE)
接下来的挑战是确定季风季节。我们将创建一个新列,使用默认值填充它,然后根据月份进行更改。
dat$monsoon.season <- "Second Inter monsoon"
dat$monsoon.season[((dat$Mon == "Dec") |
(dat$Mon == "Jan") |
(dat$Mon == "Feb"))] <- "NE Monsoon"
dat$monsoon.season[((dat$Mon == "Mar") |
(dat$Mon == "Apr"))] <- "First inter monsoon"
dat$monsoon.season[((dat$Mon == "May") |
(dat$Mon == "Jun") |
(dat$Mon == "Jul") |
(dat$Mon == "Aug") |
(dat$Mon == "Sep"))] <- "SW Monsoon"
现在,因为12月200x实际上处于与1月200x + 1相同的季风期,我们必须创建一个季风年#34;变量来捕获:
dat$monsooon.year <- dat$Year
dat$monsooon.year[dat$Mon == "Dec"] <- dat$Year[dat$Mon == "Dec"] +1
通过输入dat
:
Year Mon Site Prod RF Region monsoon.season monsooon.year
1 2000 Jan Grave 161521 261 Mid NE Monsoon 2000
2 2000 Feb Grave 142452 334 Mid NE Monsoon 2000
3 2000 Mar Grave 365697 156 Mid First inter monsoon 2000
4 2000 Apr Grave 355789 134 Mid First inter monsoon 2000
5 2000 May Grave 376843 159 Mid SW Monsoon 2000
6 2000 Jun Grave 258762 119 Mid SW Monsoon 2000
7 2000 Jul Grave 255447 41 Mid SW Monsoon 2000
8 2000 Aug Grave 188545 247 Mid SW Monsoon 2000
9 2000 Sep Grave 213663 251 Mid SW Monsoon 2000
10 2000 Oct Grave 273209 62 Mid Second Inter monsoon 2000
11 2000 Nov Grave 317468 525 Mid Second Inter monsoon 2000
12 2000 Dec Grave 238668 217 Mid NE Monsoon 2001
现在我们想要每个季节的总产量,每个季风年,每个网站(我认为)。我们可以使用aggregate
。
dat.summary <- aggregate(cbind(RF,prod) ~ monsoon.year + monsoon.season + Site,
dat,
sum)
这为您提供了一个数据框,其中包含按地点,季节和季风年份的降雨量和产量:
monsoon.season Site Prod RF
1 First inter monsoon Bay 1271818 1221
2 NE Monsoon Bay 934326 2728
3 Second Inter monsoon Bay 880541 1776
4 SW Monsoon Bay 2071107 606
5 First inter monsoon Grave 2095116 1262
6 NE Monsoon Grave 1783144 2108
7 Second Inter monsoon Grave 1347449 1469
8 SW Monsoon Grave 3626227 1464
9 First inter monsoon Horton 2006018 1628
10 NE Monsoon Horton 2264599 1828
11 Second Inter monsoon Horton 1443698 1938
12 SW Monsoon Horton 3470907 1394
您可以调整aggregate命令以获得不同的总和。例如,要获得每个站点的季风期间的总和,请使用
dat.summary <- aggregate(cbind(Prod, RF) ~ monsoon.season + Site,
data = dat,
sum)
给你..
monsoon.season Site Prod RF
1 First inter monsoon Bay 1271818 1221
2 NE Monsoon Bay 934326 2728
3 Second Inter monsoon Bay 880541 1776
4 SW Monsoon Bay 2071107 606
5 First inter monsoon Grave 2095116 1262
6 NE Monsoon Grave 1783144 2108
7 Second Inter monsoon Grave 1347449 1469
8 SW Monsoon Grave 3626227 1464
9 First inter monsoon Horton 2006018 1628
10 NE Monsoon Horton 2264599 1828
11 Second Inter monsoon Horton 1443698 1938
12 SW Monsoon Horton 3470907 1394