在for循环中子集数据集

时间:2017-02-10 11:13:09

标签: r for-loop subset

我试图遍历一个名为iter的字符串向量,以便在R中对数据帧进行子集化。我想在循环中使用i iter值过滤该数据帧,并使用iter将其分配给变量价值i。

iter<- c(COD1,COD2,COD3)

    for ( i in iter) {
      assign(i, subset(out,TestId==paste0(i) & AddId=="Curva_F_Cor"))
    }

此命令的输出将是3个变量,即数据帧。我对名为mu.spline的列感兴趣,该列存在于这3个变量中:

TestID    mu.spline   lambda.spline
COD1      0.02        3
COD1      0.03        4
COD1      0.01        1

TestID    mu.spline   lambda.spline
COD2      0.1         8
COD2      0.25        10
COD2      0.01        3

TestID    mu.spline   lambda.spline
COD3      0.12        1
COD3      0.32        8
COD3      0.22        3

但是,如果我想获取名为mu.spline的列并将其从子设置变量分配给新变量,则会发生错误:

for ( i in iter) {
  assign(i, subset(out,TestId==paste0(i) & AddId=="Curva_F_Cor"))
  assign(paste0(i,".mu"), i[,"mu.spline"])
}

输出:

Error in i[, "mu.spline"] : incorrect number of dimensions

如果我尝试以下代码,迭代正在运行:

for ( i in iter) {
  assign(i, subset(out,TestId==paste0(i) & AddId=="Curva_F_Cor"))
  i
  assign(paste0(i,".mu"), "hi")
}

当我尝试使用i [,&#34; mu.spline&#34;]从每个变量获取名为mu.spline的列时,似乎会出现问题。我想知道为什么它失败了我尝试这样在循环外获取列(COD1[,"mu.spline"])并且它有效...

**编辑dput():

structure(list(TestId = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 
3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 7L, 7L, 7L, 8L, 8L, 
8L, 9L, 9L, 9L, 10L, 10L, 10L, 11L, 11L, 11L, 12L, 12L, 12L, 
13L, 13L, 13L, 14L, 14L, 14L, 15L, 15L, 15L, 16L, 16L, 16L, 17L, 
17L, 17L, 17L, 17L, 17L, 17L, 17L, 17L, 17L, 17L), .Label = c("Comb1", 
"Comb2", "COD1", "COD2", "COD3", "COD4", "COD5", 
"COD6", "COD7", "COD8", "COD9", "COD10", "COD11", 
"COD12", "COD13", "COD14", "Pat"), class = "factor"), 
    mu.spline = c(0.156373645710651, 0.128179004733465, 0.133922208832118, 
    0.0968325365246728, 0.112497378553166, 0.108787192266453, 
    0.110192954818258, 0.121005105680758, 0.0980394197157738, 
    0.138420857616108, 0.127789639429687, 0.128560390185466, 
    0.110549423439033, 0.108320566548023, 0.098918312107995, 
    0.0828284492044932, 0.104197889210497, 0.122413067260436, 
    0.100261893863431, 0.0938211089313908, 0.0950013179641027, 
    0.145680825059066, 0.139104408376977, 0.126037019624304, 
    0.126708418382696, 0.129821223842992, 0.136480998324424, 
    0.13593684872676, 0.139066913195263, 0.148222162331793, 0.1063086971118, 
    0.167178433353777, 0.0999504815546864, 0.159110219357191, 
    0.125081233896366, 0.163966026506179, 0.15029944955429, 0.116975580695436, 
    0.15276496804095, 0.155339014181045, 0.112171217970295, 0.120104234834245, 
    0.133373734309075, 0.175784287024805, 0.133626401899954, 
    0.140297143337283, 0.0863206151811713, 0.170070971923806, 
    0.152896880973888, 0.10553437562759, 0.124122727198564, 0.163571762302165, 
    0.151047108367937, 0.131416085292366, 0.152515440225195, 
    0.139308623745812, 0.146009754853497, 0.170825235429307, 
    0.147466868348918, 0.126623691613807, 0.147114348605148, 
    0.141084369853073, 0.153670399861141, 0.162948873362462, 
    0.131121302899353, 0.146421599771427, 0.135166111999851, 
    0.157495164357944, 0.126927329131488, 0.159831796004744, 
    0.146936913846553, 0.12183336770971, 0.136669798817364, 0.152333836640196, 
    0.138055091325892)), .Names = c("TestId", "mu.spline"), row.names = c("76", 
"77", "78", "79", "80", "81", "82", "83", "84", "85", "86", "87", 
"88", "89", "90", "91", "92", "93", "94", "95", "96", "97", "98", 
"99", "100", "101", "102", "103", "104", "105", "106", "107", 
"108", "109", "110", "111", "112", "113", "114", "115", "116", 
"117", "118", "119", "120", "121", "122", "123", "124", "125", 
"126", "127", "128", "129", "130", "131", "132", "133", "134", 
"135", "136", "137", "138", "139", "140", "141", "142", "143", 
"144", "145", "146", "147", "148", "149", "150"), class = "data.frame")

2 个答案:

答案 0 :(得分:0)

我已根据您的问题制作了一个示例列表(lst),并使用do.call(rbind)将其设置为data.frame,从中可以轻松提取所需的列。

str <- '
TestID    mu.spline   lambda.spline
COD1      0.02        3
COD1      0.03        4
COD1      0.01        1
COD2      0.1         8
COD2      0.25        10
COD2      0.01        3
COD3      0.12        1
COD3      0.32        8
COD3      0.22        3    '

file <- textConnection(str)

raw <- read.table(file, header = T)

lst <- split(raw,raw$TestID)

> lst
$COD1
  TestID mu.spline lambda.spline
1   COD1      0.02             3
2   COD1      0.03             4
3   COD1      0.01             1

$COD2
  TestID mu.spline lambda.spline
4   COD2      0.10             8
5   COD2      0.25            10
6   COD2      0.01             3

$COD3
  TestID mu.spline lambda.spline
7   COD3      0.12             1
8   COD3      0.32             8
9   COD3      0.22             3

获取您想要的列:

do.call(rbind, lst)$mu.spline

或循环访问数据:

for(l in names(lst))
{
  assign(l,lst[[l]]$mu.spline)
}

答案 1 :(得分:0)

通常不建议在R中使用assign。是的,该功能可用,但不建议使用它。我相信你所看到的结果可以用更简单的方式生成 lapply命令执行与上面for循环相同的功能。

#out<- #your dataframe of data

#define an array of string valuse
iter<-c("COD1", "COD2", "COD3")
#create a list of dataframes of the subsets
ans<-lapply(iter, function(x) {subset(out, TestId==x)})
#rename the list elements
names(ans)<-iter

#to access each subset any of the listed methods:
ans[[1]]
ans["COD1"]
ans$COD1
ans[iter[1]]