R:Web抓取JSON,从嵌套

时间:2017-05-25 17:01:09

标签: json r web-scraping jsonlite

我正在尝试使用tidyJSON从JSON中提取信息,但我对任何可以实现目标的R包都持开放态度。我看了一下文档和vignittes,发现complex example很有帮助。但是,我想要的信息嵌套在非键值对中,我不知道如何访问它。我有兴趣获取appidnamedeveloper等,但此信息位于570730内:

{"570":{"appid":570,"name":"Dota 2","developer":"Valve","publisher":"Valve","score_rank":71,"owners":102151578,"owners_variance":259003,"players_forever":102151578,"players_forever_variance":259003,"players_2weeks":9436299,"players_2weeks_variance":89979,"average_forever":11727,"average_2weeks":1229,"median_forever":277,"median_2weeks":662,"ccu":811259,"price":"0","tags":{"Free to Play":22678,"MOBA":7808,"Strategy":7415,"Multiplayer":6757,"Team-Based":4848,"Action":4602,"e-sports":4089,"Online Co-Op":3669,"Competitive":3553,"PvP":2655,"RTS":2267,"Difficult":2129,"RPG":2114,"Fantasy":2044,"Tower Defense":2024,"Co-op":1898,"Character Customization":1514,"Replay Value":1487,"Action RPG":1397,"Simulation":1024}},

"730":{"appid":730,"name":"Counter-Strike: Global Offensive","developer":"Valve","publisher":"Valve","score_rank":78,"owners":29225079,"owners_variance":154335,"players_forever":28552354,"players_forever_variance":152685,"players_2weeks":9102348,"players_2weeks_variance":88410,"average_forever":17648,"average_2weeks":791,"median_forever":5030,"median_2weeks":358,"ccu":543626,"price":"1499","tags":{"FPS":17082,"Multiplayer":13744,"Shooter":12833,"Action":10881,"Team-Based":10369,"Competitive":9664,"Tactical":8529,"First-Person":7329,"e-sports":6716,"PvP":6383,"Online Co-Op":5714,"Military":4621,"Co-op":4435,"Strategy":4424,"War":4361,"Realistic":3196,"Trading":3191,"Difficult":3158,"Fast-Paced":3100,"Moddable":2496}}

有成千上万的此类条目。有没有办法跳过"顶级"在巢内看? JSON信息来自http://steamspy.com/api.php?request=top100in2weeks

1 个答案:

答案 0 :(得分:1)

这可能就是您所需要的:

library(jsonlite)
data = fromJSON("http://steamspy.com/api.php?request=top100in2weeks")

appid = lapply(data, function(x){x$appid})
name = lapply(data, function(x){x$name})

df = data.frame(appid = unlist(appid),
                name = unlist(name),
                stringsAsFactors = F)

结果:

> head(df)
        appid                             name
570       570                           Dota 2
730       730 Counter-Strike: Global Offensive
578080 578080    PLAYERUNKNOWN'S BATTLEGROUNDS
440       440                  Team Fortress 2
271590 271590               Grand Theft Auto V
433850 433850           H1Z1: King of the Kill

我允许您添加其余信息

编辑:将数组添加到数据框

可以在数据框中添加每个游戏的标签信息。而时代也标记着。对于每个游戏,您必须在列中存储一组标记名称,并将标记数量存储在另一列中。

df的定义之后添加以下行:

for(k in 1:nrow(d)){
    d$tags[k] = list(names(data[[k]]$tags))
    d$tagsQ[k] = list(unlist(data[[k]]$tags))
}

这会给你:

> d["570",]
    appid   name
570   570 Dota 2

tags
570 Free to Play, MOBA, Strategy, Multiplayer, Team-Based, Action, e-sports, Online Co-Op, Competitive, PvP, RTS, Difficult, RPG, Fantasy, Tower Defense, Co-op, Character Customization, Replay Value, Action RPG, Simulation

tagsQ
570 22686, 7810, 7420, 6759, 4850, 4603, 4092, 3672, 3555, 2657, 2267, 2130, 2116, 2045, 2024, 1898, 1514, 1487, 1397, 1023

在这种情况下,列tagstagsQ包含列表。要获取appid 570的第二个标记和数量,请执行以下操作:

> df["570","tags"][[1]][2]
[1] "MOBA"

> d["570","tagsQ"][[1]][2]
MOBA 
7810