如何使用BeautifulSoup解析包含名称空间的XSD

时间:2018-02-27 01:32:44

标签: python xml python-3.x xsd beautifulsoup

给定一个XML Schema我想创建一个包含所有XML元素名称的数组。我的错误是我的结果是空的

In [31]: import requests
    ...: from bs4 import BeautifulSoup
    ...: 
    ...: result = requests.get('http://www.ddialliance.org/Version2-1.xsd')
    ...: content = result.content
    ...: 
    ...: soup = BeautifulSoup(content, 'lxml')
    ...: 

In [32]: print(soup.html.body.next_element.name)
xs:schema

In [34]: schema = soup.html.body.next_element

In [35]: all_xml_element_tags = schema.find_all("xs:element", {"minOccurs" : "0"})
    ...: 

In [36]: all_xml_element_tags
Out[36]: []

所以我希望拥有Schema的所有元素名称

结果集应该有值......

In [39]: soup.find_all("xs:element", {"minOccurs" : "0"})
Out[39]: []

那么如何查询呢。

1 个答案:

答案 0 :(得分:2)

请注意,将解析模式从lxml更改为xml至关重要。

代码:

import requests
from bs4 import BeautifulSoup

result = requests.get('http://www.ddialliance.org/Version2-1.xsd')
content = result.text

soup = BeautifulSoup(content, 'xml')
all_xml_element_tags = soup.find_all("xs:element", minOccurs="0") # You can also use this too: all_xml_element_tags = soup.find_all("xs:element", {"minOccurs":"0"})


print(all_xml_element_tags)

输出:

[<xs:element maxOccurs="unbounded" minOccurs="0" name="Link" type="LinkType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="ExtLink" type="ExtLinkType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="respRate" type="respRateType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="EstSmpErr" type="EstSmpErrType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="dataAppr" type="dataApprType"/>, <xs:element minOccurs="0" name="catValu" type="catValuType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="labl" type="lablType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="txt" type="txtType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="catStat" type="catStatType"/>, <xs:element minOccurs="0" name="mrow" type="mrowType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="labl" type="lablType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="catStat" type="catStatType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="txt" type="txtType"/>, <xs:element minOccurs="0" name="rspStmt" type="rspStmtType"/>, <xs:element minOccurs="0" name="prodStmt" type="prodStmtType"/>, <xs:element minOccurs="0" name="distStmt" type="distStmtType"/>, <xs:element minOccurs="0" name="serStmt" type="serStmtType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="verStmt" type="verStmtType"/>, <xs:element minOccurs="0" name="biblCit" type="biblCitType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="holdings" type="holdingsType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="notes" type="notesType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="docDscr" type="docDscrType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="fileDscr" type="fileDscrType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="dataDscr" type="dataDscrType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="otherMat" type="otherMatType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="range" type="rangeType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="setAvail" type="setAvailType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="useStmt" type="useStmtType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="notes" type="notesType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="timeMeth" type="timeMethType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="dataCollector" type="dataCollectorType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="frequenc" type="frequencType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="sampProc" type="sampProcType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="deviat" type="deviatType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="collMode" type="collModeType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="resInstru" type="resInstruType"/>, <xs:element minOccurs="0" name="sources" type="sourcesType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="collSitu" type="collSituType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="actMin" type="actMinType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="ConOps" type="ConOpsType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="weight" type="weightType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="cleanOps" type="cleanOpsType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="varGrp" type="varGrpType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="nCubeGrp" type="nCubeGrpType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="var" type="varType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="nCube" type="nCubeType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="notes" type="notesType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="CubeCoord" type="CubeCoordType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="physLoc" type="physLocType"/>, <xs:element minOccurs="0" name="drvdesc" type="drvdescType"/>, <xs:element minOccurs="0" name="drvcmd" type="drvcmdType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="caseQnty" type="caseQntyType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="varQnty" type="varQntyType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="logRecL" type="logRecLType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="recPrCas" type="recPrCasType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="recNumTot" type="recNumTotType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="distrbtr" type="distrbtrType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="contact" type="contactType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="depositr" type="depositrType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="depDate" type="depDateType"/>, <xs:element minOccurs="0" name="distDate" type="distDateType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="cohort" type="cohortType"/>, <xs:element minOccurs="0" name="citation" type="citationType"/>, <xs:element minOccurs="0" name="guide" type="guideType"/>, <xs:element minOccurs="0" name="docStatus" type="docStatusType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="docSrc" type="docSrcType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="notes" type="notesType"/>, <xs:element minOccurs="0" name="rspStmt" type="rspStmtType"/>, <xs:element minOccurs="0" name="prodStmt" type="prodStmtType"/>, <xs:element minOccurs="0" name="distStmt" type="distStmtType"/>, <xs:element minOccurs="0" name="serStmt" type="serStmtType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="verStmt" type="verStmtType"/>, <xs:element minOccurs="0" name="biblCit" type="biblCitType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="holdings" type="holdingsType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="notes" type="notesType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="fileTxt" type="fileTxtType"/>, <xs:element minOccurs="0" name="locMap" type="locMapType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="notes" type="notesType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="recGrp" type="recGrpType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="notes" type="notesType"/>, <xs:element minOccurs="0" name="fileName" type="fileNameType"/>, <xs:element minOccurs="0" name="fileCont" type="fileContType"/>, <xs:element minOccurs="0" name="fileStrc" type="fileStrcType"/>, <xs:element minOccurs="0" name="dimensns" type="dimensnsType"/>, <xs:element minOccurs="0" name="fileType" type="fileTypeType"/>, <xs:element minOccurs="0" name="format" type="formatType"/>, <xs:element minOccurs="0" name="filePlac" type="filePlacType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="dataChck" type="dataChckType"/>, <xs:element minOccurs="0" name="ProcStat" type="ProcStatType"/>, <xs:element minOccurs="0" name="dataMsng" type="dataMsngType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="software" type="softwareType"/>, <xs:element minOccurs="0" name="verStmt" type="verStmtType"/>, <xs:element minOccurs="0" name="key" type="keyType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="notes" type="notesType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="dataItem" type="dataItemType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="dataColl" type="dataCollType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="notes" type="notesType"/>, <xs:element minOccurs="0" name="anlyInfo" type="anlyInfoType"/>, <xs:element minOccurs="0" name="stdyClas" type="stdyClasType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="mi" type="miType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="location" type="locationType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="labl" type="lablType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="txt" type="txtType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="universe" type="universeType"/>, <xs:element minOccurs="0" name="imputation" type="imputationType"/>, <xs:element minOccurs="0" name="security" type="securityType"/>, <xs:element minOccurs="0" name="embargo" type="embargoType"/>, <xs:element minOccurs="0" name="respUnit" type="respUnitType"/>, <xs:element minOccurs="0" name="anlysUnit" type="anlysUnitType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="verStmt" type="verStmtType"/>, <xs:element minOccurs="0" name="purpose" type="purposeType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="dmns" type="dmnsType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="measure" type="measureType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="notes" type="notesType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="labl" type="lablType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="txt" type="txtType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="concept" type="conceptType"/>, <xs:element minOccurs="0" name="defntn" type="defntnType"/>, <xs:element minOccurs="0" name="universe" type="universeType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="notes" type="notesType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="labl" type="lablType"/>, <xs:element minOccurs="0" name="txt" type="txtType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="notes" type="notesType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="table" type="tableType"/>, <xs:element minOccurs="0" name="citation" type="citationType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="otherMat" type="otherMatType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="relMat" type="relMatType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="relStdy" type="relStdyType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="relPubl" type="relPublType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="othRefs" type="othRefsType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="producer" type="producerType"/>, <xs:element minOccurs="0" name="copyright" type="copyrightType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="prodDate" type="prodDateType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="prodPlac" type="prodPlacType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="software" type="softwareType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="fundAg" type="fundAgType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="grantNo" type="grantNoType"/>, <xs:element minOccurs="0" name="varQnty" type="varQntyType"/>, <xs:element minOccurs="0" name="caseQnty" type="caseQntyType"/>, <xs:element minOccurs="0" name="logRecL" type="logRecLType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="labl" type="lablType"/>, <xs:element minOccurs="0" name="recDimnsn" type="recDimnsnType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="AuthEnty" type="AuthEntyType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="othId" type="othIdType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="serName" type="serNameType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="serInfo" type="serInfoType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="accsPlac" type="accsPlacType"/>, <xs:element minOccurs="0" name="origArch" type="origArchType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="avlStatus" type="avlStatusType"/>, <xs:element minOccurs="0" name="collSize" type="collSizeType"/>, <xs:element minOccurs="0" name="complete" type="completeType"/>, <xs:element minOccurs="0" name="fileQnty" type="fileQntyType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="notes" type="notesType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="dataSrc" type="dataSrcType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="srcOrig" type="srcOrigType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="srcChar" type="srcCharType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="srcDocu" type="srcDocuType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="sources" type="sourcesType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="stdyInfo" type="stdyInfoType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="method" type="methodType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="dataAccs" type="dataAccsType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="othrStdyMat" type="othrStdyMatType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="notes" type="notesType"/>, <xs:element minOccurs="0" name="subject" type="subjectType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="abstract" type="abstractType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="sumDscr" type="sumDscrType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="notes" type="notesType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="keyword" type="keywordType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="topcClas" type="topcClasType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="timePrd" type="timePrdType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="collDate" type="collDateType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="nation" type="nationType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="geogCover" type="geogCoverType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="geogUnit" type="geogUnitType"/>, <xs:element minOccurs="0" name="geoBndBox" type="geoBndBoxType"/>, <xs:element minOccurs="0" name="boundPoly" type="boundPolyType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="anlyUnit" type="anlyUnitType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="universe" type="universeType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="dataKind" type="dataKindType"/>, <xs:element minOccurs="0" name="titl" type="titlType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="colspec" type="colspecType"/>, <xs:element minOccurs="0" name="thead" type="theadType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="subTitl" type="subTitlType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="altTitl" type="altTitlType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="parTitl" type="parTitlType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="IDNo" type="IDNoType"/>, <xs:element minOccurs="0" name="confDec" type="confDecType"/>, <xs:element minOccurs="0" name="specPerm" type="specPermType"/>, <xs:element minOccurs="0" name="restrctn" type="restrctnType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="contact" type="contactType"/>, <xs:element minOccurs="0" name="citReq" type="citReqType"/>, <xs:element minOccurs="0" name="deposReq" type="deposReqType"/>, <xs:element minOccurs="0" name="conditions" type="conditionsType"/>, <xs:element minOccurs="0" name="disclaimer" type="disclaimerType"/>, <xs:element minOccurs="0" name="key" type="keyType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="notes" type="notesType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="location" type="locationType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="labl" type="lablType"/>, <xs:element minOccurs="0" name="imputation" type="imputationType"/>, <xs:element minOccurs="0" name="security" type="securityType"/>, <xs:element minOccurs="0" name="embargo" type="embargoType"/>, <xs:element minOccurs="0" name="respUnit" type="respUnitType"/>, <xs:element minOccurs="0" name="anlysUnit" type="anlysUnitType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="qstn" type="qstnType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="valrng" type="valrngType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="invalrng" type="invalrngType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="undocCod" type="undocCodType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="universe" type="universeType"/>, <xs:element minOccurs="0" name="TotlResp" type="TotlRespType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="sumStat" type="sumStatType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="txt" type="txtType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="stdCatgry" type="stdCatgryType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="catgryGrp" type="catgryGrpType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="catgry" type="catgryType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="codInstr" type="codInstrType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="verStmt" type="verStmtType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="concept" type="conceptType"/>, <xs:element minOccurs="0" name="derivation" type="derivationType"/>, <xs:element minOccurs="0" name="varFormat" type="varFormatType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="geoMap" type="geoMapType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="catLevel" type="catLevelType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="notes" type="notesType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="labl" type="lablType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="txt" type="txtType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="concept" type="conceptType"/>, <xs:element minOccurs="0" name="defntn" type="defntnType"/>, <xs:element minOccurs="0" name="universe" type="universeType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="notes" type="notesType"/>, <xs:element minOccurs="0" name="version" type="versionType"/>, <xs:element minOccurs="0" name="verResp" type="verRespType"/>, <xs:element maxOccurs="unbounded" minOccurs="0" name="notes" type="notesType"/>]