我正在尝试使用API,该API以嵌套XML的形式提供数据,并且希望将其保存为数据框。我的问题是我不知道如何从嵌套XML中获取值。这是一个示例:
# Sample data
library(xml2)
url <- "https://clinicaltrials.gov/api/query/full_studies?expr=neuro&min_rnk=1&max_rnk=20&fmt=xml"
download.file(url, destfile = "xml_data.xml")
fil <- "xml_data.xml"
dat <- xml2::read_xml(fil)
这提供了一个嵌套的xml文件,但我不知道如何使用此结构。
<FullStudiesResponse>
....
<FullStudyList>
<FullStudy Rank="1">
<Struct Name="Study">
<Struct Name="ProtocolSection">
<Struct Name="IdentificationModule">
<Field Name="NCTId">NCT01843582</Field>
我可以使用以下命令访问FullStudyList:
xml_find_all(x = dat, xpath = "//FullStudyList/FullStudy")
但是例如,如果我想获取所有NCTId
或Rank
值,该如何引用呢?到目前为止,我已经尝试过
xml_find_all(x = dat, xpath = "//FullStudyList/FullStudy/NCTId")
xml_find_all(x = dat, xpath = "//FullStudyList/FullStudy/@NCTId")
xml_find_all(x = dat, xpath = "//FullStudyList/FullStudy//NCTId")
显然不起作用。还是有更好的方法与嵌套xml一起使用以在数据框中获取数据?
答案 0 :(得分:2)
简短的答案是:不要使用XML。该网站的以下文档说,您可以指定所需的fmt
。它不必是XML。 JSON在R中更容易处理。
尝试一下
library(httr)
library(jsonlite)
library(tibble)
res <- fromJSON(content(GET("https://clinicaltrials.gov/api/query/full_studies?expr=neuro&min_rnk=1&max_rnk=20&fmt=json")))
结果是一个嵌套列表,但是我想您对FullStudies
中存储的数据感兴趣
df <- as_tibble(res$FullStudiesResponse$FullStudies)
这给了我们
# A tibble: 20 x 2
Rank Study$ProtocolS~ $$$OrgStudyIdIn~ $$$$OrgStudyIdT~ $$$$OrgStudyIdL~ $$$Organization~ $$$$OrgClass $$$BriefTitle $$$OfficialTitle $$$Acronym $$StatusModule$~
<int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 1 NCT02642055 NEURO+001 NA NA Neuro+ INDUSTRY Efficacy of ~ Efficacy of NEU~ NA May 2016
2 2 NCT01801813 RC12_0416 NA NA Nantes Universi~ OTHER Risk Factors~ Observational S~ Craniosco~ March 2016
3 3 NCT03813290 DSRB A/2018/006~ NA NA National Health~ OTHER_GOV A Neuro-Tech~ A Neuro-Technol~ NA February 2020
4 4 NCT03773926 2018-A00604-51 NA NA Zeta Technologi~ INDUSTRY Neuro-feedba~ Neuro-feedback ~ TNTA December 2018
5 5 NCT04189172 AAG-O-H-1630 NA NA Aesculap AG INDUSTRY MiDura-Study~ Multicenter, In~ MiDura May 2020
6 6 NCT03756337 PIC-20 NA NA Oticon Medical INDUSTRY Neuro 1 vs. ~ Comparison of A~ NA November 2018
7 7 NCT03484143 P17.03 NA NA Vielight Inc. INDUSTRY Neuro RX Gam~ Vielight Neuro ~ NA June 2020
8 8 NCT02138110 InVivo-100-101 NA NA InVivo Therapeu~ INDUSTRY The INSPIRE ~ The INSPIRE Stu~ NA December 2019
9 9 NCT03935724 A2017SCI03 NA NA Neuroplast INDUSTRY Clinical Stu~ A Multi-center,~ SCI2 September 2020
10 10 NCT03798002 RiphahI Maryam ~ NA NA Riphah Internat~ OTHER Neuro-muscul~ Effects of Neur~ NA August 2019
11 11 NCT03655262 R61MH113772 U.S. NIH Grant/~ https://project~ University of C~ OTHER Treating Pho~ Treating Phobia~ NA April 2019
12 12 NCT04418609 Neuro-COVID-19 NA NA University of Z~ OTHER Neuro-COVID-~ Neuro-COVID-19:~ Neuro-COV~ June 2020
13 13 NCT01174329 1234 NA NA Universidad Aut~ OTHER Treatment of~ Difference in S~ SALELECTR~ July 2010
14 14 NCT04205019 A2019SCI04 NA NA Neuroplast INDUSTRY Safety Stem ~ A 3 Months Open~ SSCiSCI September 2020
15 15 NCT02941627 PIC_07 NA NA Oticon Medical INDUSTRY The Neuro Zt~ The Neuro Zti C~ NA February 2017
16 16 NCT03328195 P17.02 NA NA Vielight Inc. INDUSTRY Vielight Neu~ A Pilot Study E~ NA September 2020
17 17 NCT02401841 Policlinico 12 NA NA Policlinico Hos~ OTHER Resolution o~ Resolution of N~ NA October 2015
18 18 NCT03882567 03/2015 NA NA Universidad Rey~ OTHER Effectivenes~ Effectiveness o~ SCENAR October 2019
19 19 NCT04583163 2019-0945 NA NA Hackensack Meri~ OTHER Variability ~ Inter- and Intr~ NA October 2020
20 20 NCT01845155 CMTR-TC-02 NA NA German Center f~ OTHER Neuro-Music-~ Neuro-Music-The~ NA February 2014
# ... with 103 more variables: $$$OverallStatus <chr>, $$$ExpandedAccessInfo$HasExpandedAccess <chr>, $$$StartDateStruct$StartDate <chr>, $$$$StartDateType <chr>,
# $$$PrimaryCompletionDateStruct$PrimaryCompletionDate <chr>, $$$$PrimaryCompletionDateType <chr>, $$$CompletionDateStruct$CompletionDate <chr>,
# $$$$CompletionDateType <chr>, $$$StudyFirstSubmitDate <chr>, $$$StudyFirstSubmitQCDate <chr>, $$$StudyFirstPostDateStruct$StudyFirstPostDate <chr>,
# $$$$StudyFirstPostDateType <chr>, $$$LastUpdateSubmitDate <chr>, $$$LastUpdatePostDateStruct$LastUpdatePostDate <chr>, $$$$LastUpdatePostDateType <chr>,
# $$$ResultsFirstSubmitDate <chr>, $$$ResultsFirstSubmitQCDate <chr>, $$$ResultsFirstPostDateStruct$ResultsFirstPostDate <chr>, $$$$ResultsFirstPostDateType <chr>,
# $$$LastKnownStatus <chr>, $$SponsorCollaboratorsModule$ResponsibleParty$ResponsiblePartyType <chr>, $$$$ResponsiblePartyInvestigatorFullName <chr>,
# $$$$ResponsiblePartyInvestigatorTitle <chr>, $$$$ResponsiblePartyInvestigatorAffiliation <chr>, $$$$ResponsiblePartyOldNameTitle <chr>,
# $$$$ResponsiblePartyOldOrganization <chr>, $$$LeadSponsor$LeadSponsorName <chr>, $$$$LeadSponsorClass <chr>, $$$CollaboratorList$Collaborator <list>,
# $$OversightModule$OversightHasDMC <chr>, $$$IsFDARegulatedDrug <chr>, $$$IsFDARegulatedDevice <chr>, $$$IsUnapprovedDevice <chr>, $$$IsUSExport <chr>,
# $$DescriptionModule$BriefSummary <chr>, $$$DetailedDescription <chr>, $$ConditionsModule$ConditionList$Condition <list>, $$$KeywordList$Keyword <list>,
# $$DesignModule$StudyType <chr>, $$$PhaseList$Phase <list>, $$$DesignInfo$DesignAllocation <chr>, $$$$DesignInterventionModel <chr>,
# $$$$DesignPrimaryPurpose <chr>, $$$$DesignMaskingInfo$DesignMasking <chr>, $$$$$DesignWhoMaskedList$DesignWhoMasked <list>, $$$$$DesignMaskingDescription <chr>,
# $$$$DesignObservationalModelList$DesignObservationalModel <list>, $$$$DesignTimePerspectiveList$DesignTimePerspective <list>,
# $$$$DesignInterventionModelDescription <chr>, $$$EnrollmentInfo$EnrollmentCount <chr>, $$$$EnrollmentType <chr>, $$$PatientRegistry <chr>,
# $$$TargetDuration <chr>, $$ArmsInterventionsModule$ArmGroupList$ArmGroup <list>, $$$InterventionList$Intervention <list>,
# $$OutcomesModule$PrimaryOutcomeList$PrimaryOutcome <list>, $$$SecondaryOutcomeList$SecondaryOutcome <list>, $$$OtherOutcomeList$OtherOutcome <list>,
# $$EligibilityModule$EligibilityCriteria <chr>, $$$HealthyVolunteers <chr>, $$$Gender <chr>, $$$MinimumAge <chr>, $$$MaximumAge <chr>, $$$StdAgeList$StdAge <list>,
# $$$StudyPopulation <chr>, $$$SamplingMethod <chr>, $$ContactsLocationsModule$OverallOfficialList$OverallOfficial <list>, $$$LocationList$Location <list>,
# $$$CentralContactList$CentralContact <list>, $$IPDSharingStatementModule$IPDSharing <chr>, $$ReferencesModule$ReferenceList$Reference <list>,
# $$$SeeAlsoLinkList$SeeAlsoLink <list>, $DerivedSection$MiscInfoModule$VersionHolder <chr>, $$$RemovedCountryList$RemovedCountry <list>,
# $$ConditionBrowseModule$ConditionMeshList$ConditionMesh <list>, $$$ConditionAncestorList$ConditionAncestor <list>,
# $$$ConditionBrowseLeafList$ConditionBrowseLeaf <list>, $$$ConditionBrowseBranchList$ConditionBrowseBranch <list>,
# $$InterventionBrowseModule$InterventionBrowseLeafList$InterventionBrowseLeaf <list>, $$$InterventionBrowseBranchList$InterventionBrowseBranch <list>,
# $ResultsSection$ParticipantFlowModule$FlowGroupList$FlowGroup <list>, $$$FlowPeriodList$FlowPeriod <list>, $$$FlowPreAssignmentDetails <chr>,
# $$$FlowRecruitmentDetails <chr>, $$BaselineCharacteristicsModule$BaselinePopulationDescription <chr>, $$$BaselineGroupList$BaselineGroup <list>,
# $$$BaselineDenomList$BaselineDenom <list>, $$$BaselineMeasureList$BaselineMeasure <list>, $$OutcomeMeasuresModule$OutcomeMeasureList$OutcomeMeasure <list>,
# $$AdverseEventsModule$EventsFrequencyThreshold <chr>, $$$EventsTimeFrame <chr>, $$$EventGroupList$EventGroup <list>, $$$SeriousEventList$SeriousEvent <list>,
# $$$OtherEventList$OtherEvent <list>, $$MoreInfoModule$CertainAgreement$AgreementPISponsorEmployee <chr>, $$$$AgreementRestrictiveAgreement <chr>,
# $$$PointOfContact$PointOfContactTitle <chr>, $$$$PointOfContactOrganization <chr>, $$$$PointOfContactEMail <chr>, $$$$PointOfContactPhone <chr>, ...