在我的数据集的WEBSITE列中,有列表(来自json文件)。以下是WEBSITE列的示例:
> dataset$WEBSITE[[1]])
[1] "list(Headers = list(MaxTopicsRootDomain = 30, MaxTopicsSubDomain = 20, MaxTopicsURL = 10, TopicsCount = 3), Data = list(ItemNum = 0, Item = \"https://mywebsite.com/\", ResultCode = \"OK\", Status = \"Found\", ExtBackLinks = 1398, RefDomains = 452, AnalysisResUnitsCost = 1398, ACRank = 4, ItemType = 3, IndexedURLs = 1, GetTopBackLinksAnalysisResUnitsCost = 5000, DownloadBacklinksAnalysisResUnitsCost = 25000, DownloadRefDomainBacklinksAnalysisResUnitsCost = 25000, RefIPs = 323, \n RefSubNets = 273, RefDomainsEDU = 0, ExtBackLinksEDU = 0, RefDomainsGOV = 0, ExtBackLinksGOV = 0, RefDomainsEDU_Exact = 0, ExtBackLinksEDU_Exact = 0, RefDomainsGOV_Exact = 0, ExtBackLinksGOV_Exact = 0, CrawledFlag = \"True\", LastCrawlDate = \"2017-10-05\", LastCrawlResult = \"HTTP_404_NotFound\", RedirectFlag = \"False\", FinalRedirectResult = \"\", OutDomainsExternal = \"5\", OutLinksExternal = \"11\", OutLinksInternal = \"162\", OutLinksPages = \"1\", LastSeen = \"\"... <truncated>
> dataset$WEBSITE[[2]])
[2] "list(Headers = list(MaxTopicsRootDomain = 30, MaxTopicsSubDomain = 20, MaxTopicsURL = 10, TopicsCount = 3), Data = list(ItemNum = 0, Item = \"http://www.website.uk\", ResultCode = \"OK\", Status = \"Found\", ExtBackLinks = 254, RefDomains = 76, AnalysisResUnitsCost = 254, ACRank = 9, ItemType = 3, IndexedURLs = 1, GetTopBackLinksAnalysisResUnitsCost = 5000, DownloadBacklinksAnalysisResUnitsCost = 25000, DownloadRefDomainBacklinksAnalysisResUnitsCost = 25000, RefIPs = 75, RefSubNets = 56, \n RefDomainsEDU = 0, ExtBackLinksEDU = 0, RefDomainsGOV = 0, ExtBackLinksGOV = 0, RefDomainsEDU_Exact = 0, ExtBackLinksEDU_Exact = 0, RefDomainsGOV_Exact = 0, ExtBackLinksGOV_Exact = 0, CrawledFlag = \"True\", LastCrawlDate = \"2017-12-14\", LastCrawlResult = \"DownloadedSuccessfully\", RedirectFlag = \"False\", FinalRedirectResult = \"\", OutDomainsExternal = \"2\", OutLinksExternal = \"2\", OutLinksInternal = \"19\", OutLinksPages = \"1\", LastSeen = \"\", Title = \"Dedic... <truncated>
> dataset$WEBSITE[[3]])
[3] "list(Headers = list(MaxTopicsRootDomain = 30, MaxTopicsSubDomain = 20, MaxTopicsURL = 10, TopicsCount = 3), Data = list(ItemNum = 0, Item = \"http://www.website.uk\", ResultCode = \"OK\", Status = \"Found\", ExtBackLinks = 254, RefDomains = 76, AnalysisResUnitsCost = 254, ACRank = 9, ItemType = 3, IndexedURLs = 1, GetTopBackLinksAnalysisResUnitsCost = 5000, DownloadBacklinksAnalysisResUnitsCost = 25000, DownloadRefDomainBacklinksAnalysisResUnitsCost = 25000, RefIPs = 75, RefSubNets = 56, \n RefDomainsEDU = 0, ExtBackLinksEDU = 0, RefDomainsGOV = 0, ExtBackLinksGOV = 0, RefDomainsEDU_Exact = 0, ExtBackLinksEDU_Exact = 0, RefDomainsGOV_Exact = 0, ExtBackLinksGOV_Exact = 0, CrawledFlag = \"True\", LastCrawlDate = \"2017-12-14\", LastCrawlResult = \"DownloadedSuccessfully\", RedirectFlag = \"False\", FinalRedirectResult = \"\", OutDomainsExternal = \"2\", OutLinksExternal = \"2\", OutLinksInternal = \"19\", OutLinksPages = \"1\", LastSeen = \"\", Title = \"Dedic... <truncated>
我的数据集如下所示:
COLOR | SIZE | WEBSITE
Blue | 13456 | list(Headers = list(MaxTopicsRootDomain = 30, MaxTopicsSubDomain = 20, MaxTopicsURL = 10
Green | 17487 | list(Headers = list(MaxTopicsRootDomain = 30, MaxTopicsSubDomain = 20, MaxTopicsURL = 10,
Red | 65438 | list(Headers = list(MaxTopicsRootDomain = 30, MaxTopicsSubDomain = 20, MaxTopicsURL = 10, To
我的目标是将每个json节点转换为专用列,使我的数据集看起来像这样:
COLOR | SIZE | MaxTopicsRootDomain | MaxTopicsSubDomain | MaxTopicsURL
Blue | 13456 | 30 | 20 | 10
Green | 17487 | 30 | 20 | 10
Red | 65438 | 30 | 20 | 10
我尝试了一种方法,但我不确定我是否正确...
dataset$WEBSITE <- as.character(dataset$WEBSITE) #character needed for a strsplit()
hello <- strsplit(dataset$WEBSITE, split = ",")
hello <- data.frame(COLOR = rep(dataset$Color,
sapply(hello, length)),
WEBSITE = unlist(hello))
非常感谢任何帮助!
答案 0 :(得分:0)
我终于找到了anwser。
它可能并不完美,但它确实有效!
dataset_2 <- do.call(rbind, dataset$WEBSITE)
dataset_2 <- cbind(dataset[c("COLOR")], dataset_2)
dataset <- merge(dataset,dataset_2,by="COLOR")
dataset <- unique (dataset)