我目前正在使用大型(ish)数据框(13,884行和57列)的农业数据。
数据框的一列由'区域的名称组成。感兴趣的国家。还包括正在生产的总面积'多种作物的专栏和一年的作品对应于每个观察的列。数据框的简化版本:
Dist_names <- c('A', 'B', 'C', 'D')
Rice_area <- rnorm(16, mean = 5, sd = 1)
Random_var <- rep('blah', times = 16)
Maize_area <- rnorm(16, mean = 3, sd = 1)
Random_var1 <- rep('blah', times = 16)
Wheat_area <- rnorm(16, mean = 7, sd = 1)
Year <- c(rep('1966', times = 4), rep('1971', times = 4),
rep('1984', times = 4), rep('1996', times = 4))
df_ag <- data.frame(Dist_names,
Rice_area,
Random_var,
Maize_area,
Random_var1,
Wheat_area,
Year)
df_ag
Dist_names Rice_area Random_var Maize_area Random_var1 Wheat_area Year
1 A 6.266559 blah 3.8740517 blah 7.775330 1966
2 B 5.611816 blah 1.9078029 blah 7.497784 1966
3 C 5.481312 blah 2.2931361 blah 6.556777 1966
4 D 3.982654 blah 2.2146227 blah 6.899663 1966
5 A 6.123487 blah 2.3746220 blah 6.537040 1971
6 B 6.760871 blah 2.6296762 blah 6.994326 1971
7 C 5.123877 blah 3.3364304 blah 7.348202 1971
8 D 5.340764 blah 3.3026722 blah 6.316179 1971
9 A 5.005836 blah 2.6335372 blah 7.031141 1984
10 B 4.224905 blah 4.4294862 blah 7.822868 1984
11 C 5.297800 blah 2.3048798 blah 4.287632 1984
12 D 7.870687 blah 1.5812036 blah 6.171034 1984
13 A 4.575766 blah 0.3331641 blah 6.971024 1996
14 B 5.717461 blah 2.7911101 blah 7.396314 1996
15 C 4.679965 blah 3.0742187 blah 5.575169 1996
16 D 3.892069 blah 2.5029748 blah 7.660881 1996
所以,我尝试做的是循环遍历dist_names
变量,根据crop_area
变量为每个year
变量创建一个线性模型,并将输出与abline()
一起绘制。自动执行此任务是必要的,因为有332个唯一的地区名称x 28个作物= 9296个要生成的地块。
我能够遍历单个裁剪变量并使用类似于以下内容的代码生成视觉效果:
par(ask=TRUE)
dists <- unique(df_ag$Dist_names)
for (dis in dists) {
dat <- df_ag[df_ag$Dist_names == dis, ]
m <- lm(Rice_area ~ Year, data = dat)
plot(dat$Year, dat$Rice_area, main=paste0(dat$Dist_name[1], ', ', dis))
abline(m)
}
但是,我很难概括代码,以便能够对所有crop_area
变量执行与上述相同的操作。我目前的想法是我需要一个由嵌套for
循环组成的函数。这是我最近的(非工作)尝试:
par(ask=TRUE)
graph_fun <- function(df, na.rm = TRUE) {
# find unique districts within dist_names
dists <- unique(df_ag$Dist_names)
# total area variables in data frame
ta_vars <- df_ag[grepl("area", names(df_ag))]
# loop through each district name
for (dis in dists) {
# loop through each crop variable
for (i in 1:ncol(ta_vars)) {
# new variable with each district and each crop
dat <- df_ag[df_ag$Dist_names == dis, ta_vars[i]]
}
# generate linear models and plots
m <- lm(dat[j], Year, data = dat)
plot(dat$Year, dat[j], main=paste0(dat$Dist_names[1], ', ', dis,))
abline(m)
}
}
毋庸置疑,上述代码并不能解决问题。我目前收到以下错误,但我确定代码错误的区域有多个:
x [j]中的错误:无效的下标类型&#39; list&#39;
非常感谢任何指导。如果有人能想到用for
家庭功能完成任务的方法,我就不会与apply
循环概念结婚。
答案 0 :(得分:1)
如果我这样做,我会将数据融合成一个长格式,并使用一个方便的函数来切片和切块数据。
的内容library(tidyr)
xy <- gather(df_ag, key = crop, value = area, -Random_var, -Random_var1, -Year, -Dist_names)
xy$Year <- as.numeric(as.character(xy$Year))
by(data = xy, INDICES = list(xy$Dist_names, xy$crop), FUN = function(x) {
mdl <- lm(area ~ Year, data = x)
plot(area ~ Year, data = x, type = "p")
abline(mdl)
return(mdl)
})
...但你可能还需要混合效果模型。
这会产生你之后的情节。
Dist_names <- c('A', 'B', 'C', 'D')
Rice_area <- rnorm(16, mean = 5, sd = 1)
Random_var <- rep('blah', times = 16)
Maize_area <- rnorm(16, mean = 3, sd = 1)
Random_var1 <- rep('blah', times = 16)
Wheat_area <- rnorm(16, mean = 7, sd = 1)
Year <- as.numeric(c(rep('1966', times = 4), rep('1971', times = 4),
rep('1984', times = 4), rep('1996', times = 4)))
df_ag <- data.frame(Dist_names,
Rice_area,
Random_var,
Maize_area,
Random_var1,
Wheat_area,
Year)
graph_fun <- function(df) {
# find unique districts within dist_names
dists <- unique(df$Dist_names)
# total area variables in data frame
ta_vars <- df[grepl("area", names(df))]
# browser() # if you enable this, it will, upon execution, stop the function here
# you can then look around the function and run whichever bit of code you wish to poke around
par(mfrow = c(length(dists), ncol(ta_vars)))
# loop through each district name
for (dis in dists) {
# loop through each crop variable
for (ta_var in colnames(ta_vars)) {
# new variable with each district and each crop
dat <- df[df$Dist_names == dis, ]
# generate linear models and plots
m <- lm(formula(paste(ta_var, "~ Year"), data = dat))
plot(dat$Year, dat[, ta_var], main=paste0(unique(dat$Dist_names), ', ', ta_var))
abline(m)
}
}
}
graph_fun(df_ag)
这可能不适用于多个级别,因此您必须将部件调整为一次只能生成n个级别。