我有一个CSV文件,里面还有JSON。我试图将公司,类型,驱动程序放入数据框中 我不想解析它,因为下面的CSV只是一个示例,我有更多的列具有各种json键/值(一些不存在,不按特定顺序,并且很多很多)。
我的sub-short_csvjson.csv CSV文件样本:
Married,Transportation,Color
YES,"{""Company"":""GTS"",""Type"":""Limo""}",White
,"{""Driver"":""John""}",Green
NO,"{""Type"":""Van"",""Driver"":""John""}",
我可以做什么(在解析之外)使用
创建数据框my_data$Married
my_data$Transportation.Company
my_data$Transportation.Type
my_data$Transportation.Driver
my_data$Color
由于
答案 0 :(得分:1)
这是我能想到的一个解决方案,它使用jsonlite
包和逐行处理为您提供所需的解决方案:
ASSUME df
使用read.csv
和stringsAsFactors = FALSE
看起来像这样:
df
Married Transportation Color
1 YES {"Company":"GTS","Type":"Limo"} White
2 {"Driver":"John"} Green
3 NO {"Type":"Van","Driver":"John"}
你可以这样做:
library(jsonlite)
l <- lapply(df$Transportation, fromJSON)
n <- unique(unlist(sapply(l, names)))
df[, n] <- lapply(n, function(x) sapply(l, function(y) y[[x]]))
要得到这个:
df
Married Transportation Color Company Type Driver
1 YES {"Company":"GTS","Type":"Limo"} White GTS Limo NULL
2 {"Driver":"John"} Green NULL NULL John
3 NO {"Type":"Van","Driver":"John"} NULL Van John
不确定是否有更有效的方式。
基于在实际数据中添加有关MALFORMED JSON的信息进行编辑
如果Transportation
列中的原始版本中存在格式错误的JSON,则可以采用以下方法解决此问题:
原始数据框如下:
df <- read.table(text = 'Married,Transportation,Color
YES,"{""Company"":""GTS"",""Type"":""Limo""}",White
,"{""Driver"":""John""}",Green
NO,"{""Type"":""Van"",""Driver"":""John""}",',
header = TRUE, sep = ',', stringsAsFactors = FALSE)
行绑定和额外行,格式错误的JSON带有额外的&#39;&#39;&#39;&#39;字符:
df <- rbind(df, data.frame(Married = 'NO',
Transportation = '{"Company": ""GTLS"}',
Color = 'Red'))
新df看起来像这样(请参阅第4行中格式错误的JSON):
Married Transportation Color
1 YES {"Company":"GTS","Type":"Limo"} White
2 {"Driver":"John"} Green
3 NO {"Type":"Van","Driver":"John"}
4 NO {"Company": ""GTLS"} Red
现在,使用它将所有嵌套的JSON分成不同的列:
l <- lapply(df$Transportation, function(x) tryCatch({fromJSON(x)}, error = function(e) NA))
n <- unique(unlist(sapply(l, names)))
df[, n] <- lapply(n, function(x)
sapply(l, function(y)
if (!is.null(names(y))) y[[x]]))
输出如下:
Married Transportation Color Company Type Driver
1 YES {"Company":"GTS","Type":"Limo"} White GTS Limo NULL
2 {"Driver":"John"} Green NULL NULL John
3 NO {"Type":"Van","Driver":"John"} NULL Van John
4 NO {"Company": ""GTLS"} Red NULL NULL NULL