从一个txt文件到R的多个JSON对象

时间:2018-02-23 11:17:40

标签: json r twitter

我对Json文件很新。我抓了一个带有几百万个json对象的txt文件,例如:

{
    "created_at":"Mon Oct 14 21:04:25 +0000 2013",
    "default_profile":true,
    "default_profile_image":true,
    "description":"...",
    "followers_count":5,
    "friends_count":560,
    "geo_enabled":true,
    "id":1961287134,
    "lang":"de",
    "name":"Peter Schmitz",
    "profile_background_color":"C0DEED",
    "profile_background_image_url":"http://abs.twimg.com/images/themes", 
    "utc_offset":-28800,
    ...
}
{
    "created_at":"Fri Oct 17 20:04:25 +0000 2015",
    ...
}

我想将列提取到R:

中的数据框中
Variable          Value
created_at          X     
default_profile     Y     

 …

一般来说,类似于Python中的完成(multiple Json objects in one file extract by python)。如果有人有想法或建议,将非常感谢帮助!谢谢!

1 个答案:

答案 0 :(得分:2)

以下是有关如何使用两个对象进行处理的示例。我假设您能够从文件中读取JSON,否则请参阅here

myjson = '{"created_at": "Mon Oct 14 21:04:25 +0000 2013", "default_profile": true, 
  "default_profile_image": true, "description": "...", "followers_count": 
    5, "friends_count": 560, "geo_enabled": true, "id": 1961287134, "lang":  
    "de", "name": "Peter Schmitz", "profile_background_color": "C0DEED",  
  "profile_background_image_url": "http://abs.twimg.com/images/themes", "utc_offset": -28800}
{"created_at": "Mon Oct 15 21:04:25 +0000 2013", "default_profile": true, 
  "default_profile_image": true, "description": "...", "followers_count": 
    5, "friends_count": 560, "geo_enabled": true, "id": 1961287134, "lang":  
    "de", "name": "Peter Schmitz", "profile_background_color": "C0DEED",  
  "profile_background_image_url": "http://abs.twimg.com/images/themes", "utc_offset": -28800}
'

library("rjson")

# Split the text into a list of all JSON objects. I chose '!x!x!' pretty randomly.. There may be better ways of keeping the brackets wile splitting.
my_json_objects = head(strsplit(gsub('\\}','\\}!x!x!', myjson),'!x!x!')[[1]],-1)
# read the text as JSON objects 
json_data <- lapply(my_json_objects, function(x) {fromJSON(x)})
# Transform to dataframes
json_data <- lapply(json_data, function(x) {data.frame(val=unlist(x))}) 

输出:

[[1]]
                                                            val
created_at                       Mon Oct 14 21:04:25 +0000 2013
default_profile                                            TRUE
default_profile_image                                      TRUE
description                                                 ...
followers_count                                               5
friends_count                                               560
geo_enabled                                                TRUE
id                                                   1961287134
lang                                                         de
name                                              Peter Schmitz
profile_background_color                                 C0DEED
profile_background_image_url http://abs.twimg.com/images/themes
utc_offset                                               -28800

[[2]]
                                                            val
created_at                       Mon Oct 15 21:04:25 +0000 2013
default_profile                                            TRUE
default_profile_image                                      TRUE
description                                                 ...
followers_count                                               5
friends_count                                               560
geo_enabled                                                TRUE
id                                                   1961287134
lang                                                         de
name                                              Peter Schmitz
profile_background_color                                 C0DEED
profile_background_image_url http://abs.twimg.com/images/themes
utc_offset                                               -28800

希望这有帮助!