将列表拆分为R中的多个列

时间:2017-12-16 22:47:12

标签: json r list

在我的数据集的WEBSITE列中,有列表(来自json文件)。以下是WEBSITE列的示例:

> dataset$WEBSITE[[1]])
[1] "list(Headers = list(MaxTopicsRootDomain = 30, MaxTopicsSubDomain = 20, MaxTopicsURL = 10, TopicsCount = 3), Data = list(ItemNum = 0, Item = \"https://mywebsite.com/\", ResultCode = \"OK\", Status = \"Found\", ExtBackLinks = 1398, RefDomains = 452, AnalysisResUnitsCost = 1398, ACRank = 4, ItemType = 3, IndexedURLs = 1, GetTopBackLinksAnalysisResUnitsCost = 5000, DownloadBacklinksAnalysisResUnitsCost = 25000, DownloadRefDomainBacklinksAnalysisResUnitsCost = 25000, RefIPs = 323, \n    RefSubNets = 273, RefDomainsEDU = 0, ExtBackLinksEDU = 0, RefDomainsGOV = 0, ExtBackLinksGOV = 0, RefDomainsEDU_Exact = 0, ExtBackLinksEDU_Exact = 0, RefDomainsGOV_Exact = 0, ExtBackLinksGOV_Exact = 0, CrawledFlag = \"True\", LastCrawlDate = \"2017-10-05\", LastCrawlResult = \"HTTP_404_NotFound\", RedirectFlag = \"False\", FinalRedirectResult = \"\", OutDomainsExternal = \"5\", OutLinksExternal = \"11\", OutLinksInternal = \"162\", OutLinksPages = \"1\", LastSeen = \"\"... <truncated>

> dataset$WEBSITE[[2]])
[2] "list(Headers = list(MaxTopicsRootDomain = 30, MaxTopicsSubDomain = 20, MaxTopicsURL = 10, TopicsCount = 3), Data = list(ItemNum = 0, Item = \"http://www.website.uk\", ResultCode = \"OK\", Status = \"Found\", ExtBackLinks = 254, RefDomains = 76, AnalysisResUnitsCost = 254, ACRank = 9, ItemType = 3, IndexedURLs = 1, GetTopBackLinksAnalysisResUnitsCost = 5000, DownloadBacklinksAnalysisResUnitsCost = 25000, DownloadRefDomainBacklinksAnalysisResUnitsCost = 25000, RefIPs = 75, RefSubNets = 56, \n    RefDomainsEDU = 0, ExtBackLinksEDU = 0, RefDomainsGOV = 0, ExtBackLinksGOV = 0, RefDomainsEDU_Exact = 0, ExtBackLinksEDU_Exact = 0, RefDomainsGOV_Exact = 0, ExtBackLinksGOV_Exact = 0, CrawledFlag = \"True\", LastCrawlDate = \"2017-12-14\", LastCrawlResult = \"DownloadedSuccessfully\", RedirectFlag = \"False\", FinalRedirectResult = \"\", OutDomainsExternal = \"2\", OutLinksExternal = \"2\", OutLinksInternal = \"19\", OutLinksPages = \"1\", LastSeen = \"\", Title = \"Dedic... <truncated>

 > dataset$WEBSITE[[3]])
[3] "list(Headers = list(MaxTopicsRootDomain = 30, MaxTopicsSubDomain = 20, MaxTopicsURL = 10, TopicsCount = 3), Data = list(ItemNum = 0, Item = \"http://www.website.uk\", ResultCode = \"OK\", Status = \"Found\", ExtBackLinks = 254, RefDomains = 76, AnalysisResUnitsCost = 254, ACRank = 9, ItemType = 3, IndexedURLs = 1, GetTopBackLinksAnalysisResUnitsCost = 5000, DownloadBacklinksAnalysisResUnitsCost = 25000, DownloadRefDomainBacklinksAnalysisResUnitsCost = 25000, RefIPs = 75, RefSubNets = 56, \n    RefDomainsEDU = 0, ExtBackLinksEDU = 0, RefDomainsGOV = 0, ExtBackLinksGOV = 0, RefDomainsEDU_Exact = 0, ExtBackLinksEDU_Exact = 0, RefDomainsGOV_Exact = 0, ExtBackLinksGOV_Exact = 0, CrawledFlag = \"True\", LastCrawlDate = \"2017-12-14\", LastCrawlResult = \"DownloadedSuccessfully\", RedirectFlag = \"False\", FinalRedirectResult = \"\", OutDomainsExternal = \"2\", OutLinksExternal = \"2\", OutLinksInternal = \"19\", OutLinksPages = \"1\", LastSeen = \"\", Title = \"Dedic... <truncated>

我的数据集如下所示:

COLOR       |    SIZE        |   WEBSITE
Blue        |    13456       |   list(Headers = list(MaxTopicsRootDomain = 30, MaxTopicsSubDomain = 20, MaxTopicsURL = 10
Green       |    17487       |   list(Headers = list(MaxTopicsRootDomain = 30, MaxTopicsSubDomain = 20, MaxTopicsURL = 10,
Red         |    65438       |   list(Headers = list(MaxTopicsRootDomain = 30, MaxTopicsSubDomain = 20, MaxTopicsURL = 10, To

我的目标是将每个json节点转换为专用列,使我的数据集看起来像这样:

COLOR       |    SIZE        | MaxTopicsRootDomain | MaxTopicsSubDomain | MaxTopicsURL
Blue        |    13456       | 30                  | 20                 | 10
Green       |    17487       | 30                  | 20                 | 10
Red         |    65438       | 30                  | 20                 | 10

我尝试了一种方法,但我不确定我是否正确...

dataset$WEBSITE <- as.character(dataset$WEBSITE) #character needed for a strsplit()
hello <- strsplit(dataset$WEBSITE, split = ",")
hello <- data.frame(COLOR = rep(dataset$Color, 
                            sapply(hello, length)), 
                            WEBSITE = unlist(hello))

非常感谢任何帮助!

1 个答案:

答案 0 :(得分:0)

我终于找到了anwser。

它可能并不完美,但它确实有效!

dataset_2 <- do.call(rbind, dataset$WEBSITE)
dataset_2 <- cbind(dataset[c("COLOR")], dataset_2)
dataset <- merge(dataset,dataset_2,by="COLOR")
dataset <- unique (dataset)