Question

我正在使用Reddit JSON API和R从Reddit中删除一些注释。由于数据没有平面结构，因此提取它有点棘手，但我找到了一种方法。为了让您了解我必须做的事情，这里有一个简短的例子：

x = "http://www.reddit.com/r/funny/comments/2eerfs/fifa_glitch_cosplay/.json" # example url
rawdat   = readLines(x,warn=F) # reading in the data
rawdat   = fromJSON(rawdat) # formatting
dat_list = repl = rawdat[[2]][[2]][[2]] # this will be used later
sq       = seq(dat_list)[-1]-1 # number of comments
txt      = unlist(lapply(sq,function(x)dat_list[[x]][[2]][[14]])) # comments (not replies)

# loop time:

for(a in sq){
  repl  = tryCatch(repl[[a]][[2]][[5]][[2]][[2]],error=function(e) NULL) # getting replies all replies to comment a

  if(length(repl)>0){ # in case there are no replies
    sq  = seq(repl)[-1]-1 # number of replies
    txt    = c(txt,unlist(lapply(sq,function(x)repl[[x]][[2]][[14]]))) # this is what I want

    # next level down
    for(b in sq){
      repl  = tryCatch(repl[[b]][[2]][[5]][[2]][[2]],error=function(e) NULL) # getting all replies to reply b of comment a

      if(length(repl)>0){
        sq  = seq(repl)[-1]-1
        txt    = c(txt,unlist(lapply(sq,function(x)repl[[x]][[2]][[14]])))   
      }
    }
  }
}

在上面的示例中，获取所有评论，对每个评论的第一级回复和第二级回复（即对每个回复的回复），但这可能会更深入，所以我是试图找出一种有效的方法来处理这个问题。要手动实现这一点，我必须做的是：

（1）从最后一个循环中复制以下代码：

for(b in sq){
  repl  = tryCatch(repl[[b]][[2]][[5]][[2]][[2]],error=function(e) NULL)

  if(length(repl)>0){
    sq  = seq(repl)[-1]-1
    txt    = c(txt,unlist(lapply(sq,function(x)repl[[x]][[2]][[14]])))   
  }
}

（2）将该代码粘贴在以txt = ...开头的行之后，并将循环中的b更改为c

（3）重复此过程大约20次左右，以确保捕获所有内容，您可以想象创建一个巨大的循环。我希望必须有办法以某种方式折叠这个循环并使它更优雅......

如果您对如何改进此循环有任何想法，我真的很感激您是否可以分享您的想法。

非常感谢！

如何减少R中的条件嵌套循环

0 个答案: