如何在R中递归地构建一个未知深度的树

时间:2016-08-24 12:59:23

标签: r recursion tree

我正在尝试通过RSelenium,XML和JSON将刮下的媒体评论数据obtained存储到树中,使用Christoph Gluc在R中的data.tree包。我面临的问题是我事先不知道深度由于发布了一些评论的方式,因此树的结构。主要网站是Disqus,人们评论可能会评论该文章,回复其他人的评论,其他人直接回复评论。因此,评论的深度是未知的。

我的数据保存在列表列表中,其中每个列表代表一个disqus注释,如果注释没有“子”注释,则可能包含6个元素,如果存在子注释,则可能包含7个元素。第七个元素将是另一个一个或多个注释的列表。以下结构提供了数据的概念:

newtree <- Node$new("article_Name")
post <- newtree$AddChild("Post ID_1")$
AddChild("Date")$
AddSibling("Poster")$
AddSibling("Disqus Name")$
AddSibling("Message")$
AddSibling("Num Children")$
parent$
AddChildNode(Node$new("Child Post ID_1"))$
  AddChild("Date")$
  AddSibling("Poster")$
  AddSibling("Disqus Name")$
  AddSibling("Message")$
  AddSibling("Num Children")$
  AddChildNode(Node$new("Child-Child Post ID_1"))$
    AddChild("Date")$
    AddSibling("Poster")$
    AddSibling("Disqus Name")$
    AddSibling("Message")$
    AddSibling("Num Children")$
  parent$
  parent$
  parent$
AddChildNode(Node$new("Child Post ID_2"))$
root$
AddChild("Post ID 2")
print(newtree)

我尝试使用循环创建上面但我显然不能超越第一级别的孩子,因为我不知道孩子的孩子是否也可能有孩子。

我曾尝试在递归上寻找帖子,但可以找到关于R的任何内容,尽管在其他语言中有很多例如javascript。以下是我尝试创建树的代码,它确实看起来很难看到访问某些更深层元素所需的所有[[]]

commentTree <- Node$new("article_Name")
for (i in 1:length(appNodes)) {
i <- 1
post <- commentTree$AddChild(
    postData[[i]][1])$
    AddChild(postData[[i]][2])$
    AddSibling(postData[[i]][3])$
    AddSibling(postData[[i]][4])$
    AddSibling(postData[[i]][5])$
    AddSibling(postData[[i]][6])

while (postData[[i]][[6]] > 0) {
  for (j in 1 : length(postData[[i]][[7]])) {
     post$AddChildNode(Node$new(postData[[i]][[7]][[j]][1]))$
     AddChild(postData[[i]][[7]][[j]][2])$
     AddSibling(postData[[i]][[7]][[j]][3])$
     AddSibling(postData[[i]][[7]][[j]][4])$
     AddSibling(postData[[i]][[7]][[j]][5])$
     AddSibling(postData[[i]][[7]][[j]][6])
  }
}
print(commentTree)
}

任何编写递归函数的帮助都将非常感激。感谢。

编辑 - 添加了评论中发布的样本日期以提供清晰度

[[1]]
[[1]]$postId
[1] "2794864846"

[[1]]$date
[1] "Thursday, July 21, 2016 9:28 AM"

[[1]]$poster
[1] "Lucienne"

[[1]]$disqusUname
[1] "disqus_AEt1ZsgK9N"

[[1]]$message
[1] "200 hundred pilots for 7 planes? Wow each of them must work very long hours. "

[[1]]$numChildren
[1] 1

[[1]]$child
[[1]]$child[[1]]
[[1]]$child[[1]]$postId
[1] "2795010796"

[[1]]$child[[1]]$date
[1] "Thursday, July 21, 2016 11:50 AM"

[[1]]$child[[1]]$poster
[1] "Jesmond Tedesco Triccas"

[[1]]$child[[1]]$disqusUname
[1] "jesmondtedescotriccas"

[[1]]$child[[1]]$message
[1] "My thoughts exactly"

[[1]]$child[[1]]$numChildren
[1] 0

当我使用tmpTree <- as.Node(postData)从列表转换为树时,我获得了以下内容。我可以使用tmpTree$'1'$poster给出“Lucienne”来访问我的树,tmpTree$'1'$child$'1'$poster给出“Jesmond Tedesco ......”。在进行转换时,子节点名称可以以某种方式设置为postId字段中的值吗?

我仍然坚持尝试以递归的方式来阅读所有评论的数据。

     levelName
1 Root         
2  °--1        
3      °--child
4          °--1

编辑 - 添加了可重现的代码此代码是带有子评论的评论。我为这个例子的长度道歉。这是一个例子,其中有孩子有其他孩子的评论等。

    list(structure(list(postId = "2794968061", date = "Thursday, July 21, 2016 10:56 AM", 
    poster = "toni", disqusUname = "disqus_bujblK3zF5", message = "unbeleivable, to hear today's socialists condemning workers for trying to organise a strike, where are the likes of GWU and the Labour of old defending workers rights?", 
    numChildren = 1L, child = list(structure(list(postId = "2794971958", 
        date = "Thursday, July 21, 2016 11:01 AM", poster = "Glorfindel", 
        disqusUname = "disqus_daQLxWKMFy", message = "Workers rights yes, but these are not workers but wanna be millionaires in the making! They should do some research and see what a great life they have, then maybe drop these unrealistic demands! Shame on these pilots!", 
        numChildren = 2L, child = list(structure(list(postId = "2798727439", 
            date = "Saturday, July 23, 2016 9:14 AM", poster = "Christopher Hitch Borg", 
            disqusUname = "christopherhitchborg", message = "Pilots are workers.", 
            numChildren = 1L, child = list(structure(list(postId = "2798801249", 
                date = "Saturday, July 23, 2016 11:06 AM", poster = "Glorfindel", 
                disqusUname = "disqus_daQLxWKMFy", message = "Dream on. Sounds to me you are either very dumb or a capitalist trying to confuse issues.", 
                numChildren = 0), .Names = c("postId", "date", 
            "poster", "disqusUname", "message", "numChildren"
            )))), .Names = c("postId", "date", "poster", "disqusUname", 
        "message", "numChildren", "child")), structure(list(postId = "2794982098", 
            date = "Thursday, July 21, 2016 11:14 AM", poster = "toni", 
            disqusUname = "disqus_bujblK3zF5", message = "pilots all over the world have a good salary, to be were they are they had to make big sacrifices and pay  lots of money for the studies. The shame is on persons getting 13000 euros for absolutely nothing, or persons put in high places with no experience at all, shame is making an 18 year old a CEO, and I could go on for ever.We should thank all Air Malta pilots for doing a good job for all this time", 
            numChildren = 2L, child = list(structure(list(postId = "2795785527", 
                date = "Thursday, July 21, 2016 8:00 PM", poster = "Glorfindel", 
                disqusUname = "disqus_daQLxWKMFy", message = "Agan: when a company is fighting for survival, it is shameful, distasteful and counterproductive to demand a 30% salary increase!!! Especially that when compared to other pilots their perks are already better than most!The other stuff you mentio has nothing to do with this article. However two wrongs do not make a right. Simple as that.", 
                numChildren = 0), .Names = c("postId", "date", 
            "poster", "disqusUname", "message", "numChildren"
            )), structure(list(postId = "2795010275", date = "Thursday, July 21, 2016 11:50 AM", 
                poster = "Jesmond Tedesco Triccas", disqusUname = "jesmondtedescotriccas", 
                message = "Air Malta pilots have their training paid for by the company. And do they have definite or indefinite contracts?", 
                numChildren = 1L, child = list(structure(list(
                  postId = "2795206130", date = "Thursday, July 21, 2016 2:53 PM", 
                  poster = "toni", disqusUname = "disqus_bujblK3zF5", 
                  message = "I have my doubts about the company paying for training, because I know of persons who couldin't make it for the financial reasons, however whatever the situation one cannot deny that they have one of the most difficult and responsible jobs existing", 
                  numChildren = 0), .Names = c("postId", "date", 
                "poster", "disqusUname", "message", "numChildren"
                )))), .Names = c("postId", "date", "poster", 
            "disqusUname", "message", "numChildren", "child")))), .Names = c("postId", 
        "date", "poster", "disqusUname", "message", "numChildren", 
        "child")))), .Names = c("postId", "date", "poster", "disqusUname", 
    "message", "numChildren", "child")))), .Names = c("postId", 
"date", "poster", "disqusUname", "message", "numChildren", "child"
)))

1 个答案:

答案 0 :(得分:1)

这已经为您实现,您不需要自己应用递归。

将您发布的数据上面,并假设它被称为lol(对于&#34;列表列表&#34;,没有双关语意图),我们可以这样做:

(2)

这将打印为:

    private void pictureBox1_MouseDoubleClick(object sender, MouseEventArgs e)
    {
        Circle newCircle = new Circle();

        if (e.Button == MouseButtons.Left)
        {
            circle.Name = Count.ToString();
            Location.Offset(-circle.size.Width / 2, -circle.size.Height / 2);
            circle.Location = e.Location;
            circle.CircleShape.Add(new Rectangle(circle.Location, circle.size));
            pictureBox1.Invalidate();
        }
        circle.Circles.Add(newCircle);
        Count++;
    }

有关FromListExplicit的详细信息,请参阅

tree <- FromListExplicit(lol[[1]], nameName = "postId", childrenName = "child")
print(tree, "date", "poster", "disqusUname")

或(更容易记住):

                   levelName                             date                  poster           disqusUname
1 2794968061                 Thursday, July 21, 2016 10:56 AM                    toni     disqus_bujblK3zF5
2  °--2794971958             Thursday, July 21, 2016 11:01 AM              Glorfindel     disqus_daQLxWKMFy
3      ¦--2798727439          Saturday, July 23, 2016 9:14 AM  Christopher Hitch Borg  christopherhitchborg
4      ¦   °--2798801249     Saturday, July 23, 2016 11:06 AM              Glorfindel     disqus_daQLxWKMFy
5      °--2794982098         Thursday, July 21, 2016 11:14 AM                    toni     disqus_bujblK3zF5
6          ¦--2795785527      Thursday, July 21, 2016 8:00 PM              Glorfindel     disqus_daQLxWKMFy
7          °--2795010275     Thursday, July 21, 2016 11:50 AM Jesmond Tedesco Triccas jesmondtedescotriccas
8              °--2795206130  Thursday, July 21, 2016 2:53 PM                    toni     disqus_bujblK3zF5