R如何为每个国家/地区的子集数据框创建循环

时间:2014-09-11 12:14:08

标签: r loops lapply

我有五个数据帧:t1,t2,t3,t4,t5。所有数据框(它们具有相同的结构,只有一些值不同)具有一个具有相同数量属性的变量“country”。

基本上我想得到一些变量N: 对于每个国家/地区和dataset =>一个变量。

我的代码现在看起来像这样,但它非常繁琐冗长:

t1.COUNTRY1 <- subset(t1, SA0100="COUNTRY1")
t2.COUNTRY1 <- subset(t2, SA0100="COUNTRY1")
t3.COUNTRY1 <- subset(t3, SA0100="COUNTRY1")
t4.COUNTRY1 <- subset(t4, SA0100="COUNTRY1")
t5.COUNTRY1 <- subset(t5, SA0100="COUNTRY1")
t1.COUNTRY2 <- subset(t1, SA0100="COUNTRY2")
t2.COUNTRY2 <- subset(t2, SA0100="COUNTRY2")
...

数据集t1 ,其他人看起来相同

    SA0100    DA1000   DA2100  RA0300
1   COUNTRY1  40000    45666    45
2   COUNTRY1  25456    78888    36
3   COUNTRY1  45666    12547    18
4   COUNTRY1  41255    58796    23 
5   COUNTRY1  78992    32589    28
6   COUNTRY2  12558    25556    22
7   COUNTRY2  96542    65478    78

我试过使用一个循环,但是我没有设法得到任何东西,在这种特殊情况下我没有看到如何使用lapply()函数。

你能帮助我吗?

2 个答案:

答案 0 :(得分:0)

此脚本使用循环并将列表中的国家设置为国家/地区,假设国家1出现在t1,国家2出现在t2等。如果国家/地区也出现在其他数据集中(例如,数据集2中的国家/地区1),那么应该更改脚本的最后一行(在t1,t2等中更改tcp)。

a=5 # number of iterations, datasets t1:t5

tch<-paste0(rep("t",each=a), c(1:a))
cch<-paste0(rep("Country",each=a), c(1:a))
country<-list()

for (i in 1:a)
{tcp<-get(tch[i])
country[[i]] <- (subset(tcp, SAO100==cch[i]))}

答案 1 :(得分:0)

假设你想要创建objects(我希望将它放在一个列表而不是包含大量对象),你可以这样做:

 list2env(unlist(lapply(mget(ls(pattern="t\\d+")), 
          function(x) split(x, x$SA0100)), recursive=FALSE), 
                                              envir=.GlobalEnv)


 t1.COUNTRY1
 #    SA0100 DA1000 DA2100 RA0300
 #1 COUNTRY1  40000  45666     45
 #2 COUNTRY1  25456  78888     36
 #3 COUNTRY1  45666  12547     18
 #4 COUNTRY1  41255  58796     23
 #5 COUNTRY1  78992  32589     28

  t3.COUNTRY2
  #  SA0100 DA1000 DA2100 RA0300
  #1 COUNTRY2  12558  25556     22
  #4 COUNTRY2  12558  25556     22

数据

 t1 <- structure(list(SA0100 = c("COUNTRY1", "COUNTRY1", "COUNTRY1", 
 "COUNTRY1", "COUNTRY1", "COUNTRY2", "COUNTRY2"), DA1000 = c(40000L, 
 25456L, 45666L, 41255L, 78992L, 12558L, 96542L), DA2100 = c(45666L, 
 78888L, 12547L, 58796L, 32589L, 25556L, 65478L), RA0300 = c(45L, 
 36L, 18L, 23L, 28L, 22L, 78L)), .Names = c("SA0100", "DA1000", 
 "DA2100", "RA0300"), class = "data.frame", row.names = c("1", 
 "2", "3", "4", "5", "6", "7"))

  t2 <- structure(list(SA0100 = c("COUNTRY2", "COUNTRY2", "COUNTRY1"), 
  DA1000 = c(96542L, 96542L, 45666L), DA2100 = c(65478L, 65478L, 
  12547L), RA0300 = c(78L, 78L, 18L)), .Names = c("SA0100", 
 "DA1000", "DA2100", "RA0300"), row.names = c(NA, 3L), class = "data.frame")

  t3 <- structure(list(SA0100 = c("COUNTRY2", "COUNTRY1", "COUNTRY1", 
  "COUNTRY2", "COUNTRY1"), DA1000 = c(12558L, 78992L, 41255L, 12558L, 
  40000L), DA2100 = c(25556L, 32589L, 58796L, 25556L, 45666L), 
  RA0300 = c(22L, 28L, 23L, 22L, 45L)), .Names = c("SA0100", 
  "DA1000", "DA2100", "RA0300"), row.names = c(NA, 5L), class = "data.frame")