创建循环的图形函数:1)一列中的唯一行AND 2)某些其他列中的所有值

时间:2017-02-15 20:10:12

标签: r for-loop dataframe plot

我目前正在使用大型(ish)数据框(13,884行和57列)的农业数据。

数据框的一列由'区域的名称组成。感兴趣的国家。还包括正在生产的总面积'多种作物的专栏和一年的作品对应于每个观察的列。数据框的简化版本:

Dist_names  <- c('A', 'B', 'C', 'D')
Rice_area   <- rnorm(16, mean = 5, sd = 1)
Random_var  <- rep('blah', times = 16)
Maize_area  <- rnorm(16, mean = 3, sd = 1)
Random_var1 <- rep('blah', times = 16)
Wheat_area  <- rnorm(16, mean = 7, sd = 1)
Year        <- c(rep('1966', times = 4), rep('1971', times = 4), 
                 rep('1984', times = 4), rep('1996', times = 4))

df_ag <- data.frame(Dist_names, 
                    Rice_area,
                    Random_var,
                    Maize_area,
                    Random_var1,
                    Wheat_area,
                    Year) 
df_ag

   Dist_names Rice_area Random_var Maize_area Random_var1 Wheat_area Year
1           A  6.266559       blah  3.8740517        blah   7.775330 1966
2           B  5.611816       blah  1.9078029        blah   7.497784 1966
3           C  5.481312       blah  2.2931361        blah   6.556777 1966
4           D  3.982654       blah  2.2146227        blah   6.899663 1966
5           A  6.123487       blah  2.3746220        blah   6.537040 1971
6           B  6.760871       blah  2.6296762        blah   6.994326 1971
7           C  5.123877       blah  3.3364304        blah   7.348202 1971
8           D  5.340764       blah  3.3026722        blah   6.316179 1971
9           A  5.005836       blah  2.6335372        blah   7.031141 1984
10          B  4.224905       blah  4.4294862        blah   7.822868 1984
11          C  5.297800       blah  2.3048798        blah   4.287632 1984
12          D  7.870687       blah  1.5812036        blah   6.171034 1984
13          A  4.575766       blah  0.3331641        blah   6.971024 1996
14          B  5.717461       blah  2.7911101        blah   7.396314 1996
15          C  4.679965       blah  3.0742187        blah   5.575169 1996
16          D  3.892069       blah  2.5029748        blah   7.660881 1996

所以,我尝试做的是循环遍历dist_names变量,根据crop_area变量为每个year变量创建一个线性模型,并将输出与abline()一起绘制。自动执行此任务是必要的,因为有332个唯一的地区名称x 28个作物= 9296个要生成的地块。

我能够遍历单个裁剪变量并使用类似于以下内容的代码生成视觉效果:

par(ask=TRUE)
dists <- unique(df_ag$Dist_names)
for (dis in dists) {
  dat <- df_ag[df_ag$Dist_names == dis, ]
  m <- lm(Rice_area ~ Year, data = dat)
  plot(dat$Year, dat$Rice_area, main=paste0(dat$Dist_name[1], ', ', dis))
  abline(m)
}

但是,我很难概括代码,以便能够对所有crop_area变量执行与上述相同的操作。我目前的想法是我需要一个由嵌套for循环组成的函数。这是我最近的(非工作)尝试:

par(ask=TRUE)
graph_fun <- function(df, na.rm = TRUE) {

  # find unique districts within dist_names
  dists <- unique(df_ag$Dist_names)

  # total area variables in data frame
  ta_vars <- df_ag[grepl("area", names(df_ag))]

  # loop through each district name
  for (dis in dists) {

    # loop through each crop variable
    for (i in 1:ncol(ta_vars)) {

      # new variable with each district and each crop        
      dat <- df_ag[df_ag$Dist_names == dis, ta_vars[i]]

    }

    # generate linear models and plots
    m <- lm(dat[j], Year, data = dat)
    plot(dat$Year, dat[j], main=paste0(dat$Dist_names[1], ', ', dis,))
    abline(m)

  }

} 

毋庸置疑,上述代码并不能解决问题。我目前收到以下错误,但我确定代码错误的区域有多个:

  

x [j]中的错误:无效的下标类型&#39; list&#39;

非常感谢任何指导。如果有人能想到用for家庭功能完成任务的方法,我就不会与apply循环概念结婚。

1 个答案:

答案 0 :(得分:1)

如果我这样做,我会将数据融合成一个长格式,并使用一个方便的函数来切片和切块数据。

的内容
library(tidyr)
xy <- gather(df_ag, key = crop, value = area, -Random_var, -Random_var1, -Year, -Dist_names)
xy$Year <- as.numeric(as.character(xy$Year))


by(data = xy, INDICES = list(xy$Dist_names, xy$crop), FUN = function(x) {
  mdl <- lm(area ~ Year, data = x)
  plot(area ~ Year, data = x, type = "p")
  abline(mdl)

  return(mdl)
  })

...但你可能还需要混合效果模型。

这会产生你之后的情节。

Dist_names  <- c('A', 'B', 'C', 'D')
Rice_area   <- rnorm(16, mean = 5, sd = 1)
Random_var  <- rep('blah', times = 16)
Maize_area  <- rnorm(16, mean = 3, sd = 1)
Random_var1 <- rep('blah', times = 16)
Wheat_area  <- rnorm(16, mean = 7, sd = 1)
Year        <- as.numeric(c(rep('1966', times = 4), rep('1971', times = 4), 
                 rep('1984', times = 4), rep('1996', times = 4)))

df_ag <- data.frame(Dist_names, 
                    Rice_area,
                    Random_var,
                    Maize_area,
                    Random_var1,
                    Wheat_area,
                    Year)

graph_fun <- function(df) {
  # find unique districts within dist_names
  dists <- unique(df$Dist_names)

  # total area variables in data frame
  ta_vars <- df[grepl("area", names(df))]

# browser() # if you enable this, it will, upon execution, stop the function here
# you can then look around the function and run whichever bit of code you wish to poke around
  par(mfrow = c(length(dists), ncol(ta_vars)))

  # loop through each district name
  for (dis in dists) {

    # loop through each crop variable
    for (ta_var in colnames(ta_vars)) {

      # new variable with each district and each crop        
      dat <- df[df$Dist_names == dis, ]

      # generate linear models and plots
      m <- lm(formula(paste(ta_var, "~ Year"), data = dat))
      plot(dat$Year, dat[, ta_var], main=paste0(unique(dat$Dist_names), ', ', ta_var))
      abline(m)
    }

  }
}

graph_fun(df_ag)

这可能不适用于多个级别,因此您必须将部件调整为一次只能生成n个级别。