从java中的文件中排除所有xml标记

时间:2018-06-13 15:39:21

标签: java xml parsing xml-parsing nlp

我有一个我在java中解析过的文件,文件的每个单词都存储为字符串。该文件如下所示。我想排除所有包含xml标签的单词。例如,我想删除所有标签等单词     我尝试过使用startswith“<”并以“>”结束但它不起作用。无论如何我可以从上面的文件中排除所有标签。或者有没有更简单的方法从xml文件中提取特定字段?

<title>1
    INDICATIONSANDUSAGE</title>
    <text>
    <paragraph>Therapy
    withlipid-alteringagentsshouldbeonlyonecomponentofmultipleriskfactorinterventioninindividualsatsignificantlyincreasedriskforatheroscleroticvasculardiseaseduetohypercholesterolemia.Drugtherapyisrecommendedasanadjuncttodietwhentheresponsetoadietrestrictedinsaturatedfatandcholesterolandothernonpharmacologicmeasuresalonehasbeeninadequate.InpatientswithCHDormultipleriskfactorsforCHD,LIPITORcanbestartedsimultaneouslywithdiet.</paragraph>
    </text>
    <effectiveTime
    value="20170630"/><excerpt>
    <highlight>
    <text>
    <paragraph>LIPITOR
    isanHMG-CoAreductaseinhibitorindicatedasanadjuncttherapytodietto:</paragraph>
    <list
    listType="unordered"styleCode="disc"><item>Reduce
    theriskofMI,stroke,revascularizationprocedures,andanginainadultpatientswithoutCHD,butwithmultipleriskfactors(<linkHtml
    href="#S1.1">1.1</linkHtml>).</item>
    <item>Reduce
    theriskofMIandstrokeinadultpatientswithtype2diabeteswithoutCHD,butwithmultipleriskfactors(<linkHtml
    href="#S1.1">1.1</linkHtml>).</item>
    <item>Reduce
    theriskofnon-fatalMI,fatalandnon-fatalstroke,revascularizationprocedures,hospitalizationforCHF,andanginainadultpatientswithCHD(<linkHtml
    href="#S1.1">1.1</linkHtml>).</item>
    <item>Reduce
    elevatedtotal-C,LDL-C,apoB,andTGlevelsandincreaseHDL-Cinadultpatientswithprimaryhyperlipidemia(heterozygousfamilialandnonfamilial)andmixeddyslipidemia(<linkHtml
    href="#S1.2">1.2</linkHtml>).</item>
    <item>
    ReduceelevatedTGinadultpatientswithhypertriglyceridemiaandprimarydysbetalipoproteinemia(<linkHtml
    href="#S1.2">1.2</linkHtml>).</item>
    <item>Reduce
    total-CandLDL-Cinpatientswithhomozygousfamilialhypercholesterolemia(HoFH)(<linkHtml
    href="#S1.2">1.2</linkHtml>).</item>
    <item>Reduce
    elevatedtotal-C,LDL-C,andapoBlevelsinpediatricpatients,10yearsto17yearsofage,withheterozygousfamilialhypercholesterolemia(HeFH)afterfailinganadequatetrialofdiettherapy(<linkHtml
    href="#S1.2">1.2</linkHtml>).</item>
    </list>
    <paragraph>
    <content
    styleCode="underline">LimitationsofUse</content>:</paragraph>
    <paragraph>LIPITOR
    hasnotbeenstudiedin<content
    styleCode="italics">Fredrickson</content>Types
    IandVdyslipidemias(<linkHtml
    href="#S1.3">1.3</linkHtml>).</paragraph>
    </text>
    </highlight>
    </excerpt>
    <component>
    <section
    ID="S1.1"><id
    root="3a10e3ca-e81c-43c9-9262-e15f334eedfc"/><code
    code="42229-5"codeSystem="2.16.840.1.113883.6.1"displayName="SPLUNCLASSIFIEDSECTION"/><title>1.1
    PreventionofCardiovascularDiseaseinAdults</title>
    <text>
    <paragraph>In
    adultpatientswithoutclinicallyevidentcoronaryheartdisease,butwithmultipleriskfactorsforcoronaryheartdiseasesuchasage,smoking,hypertension,lowHDL-C,orafamilyhistoryofearlycoronaryheartdisease,LIPITORisindicatedto:</paragraph>
    <list
    listType="unordered"styleCode="disc"><item>Reduce
    theriskofmyocardialinfarction</item>
    <item>Reduce
    theriskofstroke</item>
    <item>Reduce
    theriskforrevascularizationproceduresandangina</item>
    </list>
    <paragraph>In
    adultpatientswithtype2diabetes,andwithoutclinicallyevidentcoronaryheartdisease,butwithmultipleriskfactorsforcoronaryheartdiseasesuchasretinopathy,albuminuria,smoking,orhypertension,LIPITORisindicatedto:</paragraph>
    <list
    listType="unordered"styleCode="disc"><item>Reduce
    theriskofmyocardialinfarction</item>
    <item>Reduce
    theriskofstroke</item>
    </list>
    <paragraph>In
    adultpatientswithclinicallyevidentcoronaryheartdisease,LIPITORisindicatedto:</paragraph>
    <list
    listType="unordered"styleCode="disc"><item>Reduce
    theriskofnon-fatalmyocardialinfarction</item>
    <item>Reduce
    theriskoffatalandnon-fatalstroke</item>
    <item>Reduce
    theriskforrevascularizationprocedures</item>
    <item>Reduce
    theriskofhospitalizationforCHF</item>
    <item>Reduce
    theriskofangina</item>
    </list>
    </text>
    <effectiveTime
    value="20170630"/></section>
    </component>
    <component>
    <section
    ID="S1.2"><id
    root="d9003937-e6d4-453b-a767-40f6077a351a"/><code
    code="42229-5"codeSystem="2.16.840.1.113883.6.1"displayName="SPLUNCLASSIFIEDSECTION"/><title>1.2
    Hyperlipidemia</title>
    <text>
    <paragraph>LIPITOR
    isindicated:</paragraph>
    <list
    listType="unordered"styleCode="square"><item>As
    anadjuncttodiettoreduceelevatedtotal-C,LDL-C,apoB,andTGlevelsandtoincreaseHDL-Cinadultpatientswithprimaryhypercholesterolemia(heterozygousfamilialandnonfamilial)andmixeddyslipidemia(<content
    styleCode="italics">Fredrickson</content>Types
    IIaandIIb);</item>
    <item>As
    anadjuncttodietforthetreatmentofadultpatientswithelevatedserumTGlevels(<content
    styleCode="italics">Fredrickson</content>Type
    IV);</item>
    <item>For
    thetreatmentofadultpatientswithprimarydysbetalipoproteinemia(<content
    styleCode="italics">Fredrickson</content>Type
    III)whodonotrespondadequatelytodiet;</item>
    <item>To
    reducetotal-CandLDL-Cinpatientswithhomozygousfamilialhypercholesterolemia(HoFH)asanadjuncttootherlipid-loweringtreatments(e.g.,LDLapheresis)orifsuchtreatmentsareunavailable;</item>
    <item>As
    anadjuncttodiettoreducetotal-C,LDL-C,andapoBlevelsinpediatricpatients,10yearsto17yearsofage,withheterozygousfamilialhypercholesterolemia(HeFH)ifafteranadequatetrialofdiettherapythefollowingfindingsarepresent:<list
    listType="ordered"styleCode="LittleAlpha"><item>LDL-C
    remains≥190mg/dLor</item>
    <item>LDL-C
    remains≥160mg/dLand:<list
    listType="unordered"styleCode="disc"><item>there
    isapositivefamilyhistoryofprematurecardiovasculardiseaseor</item>
    <item>two
    ormoreotherCVDriskfactorsarepresentinthepediatricpatient</item>
    </list>
    </item>
    </list>
    </item>
    </list>
    </text>
    <effectiveTime
    value="20170630"/></section>
    </component>
    <component>
    <section
    ID="S1.3"><id
    root="b9d715d4-0a9e-4ada-9fd8-574fd627290a"/><code
    code="42229-5"codeSystem="2.16.840.1.113883.6.1"displayName="SPLUNCLASSIFIEDSECTION"/><title>1.3
    LimitationsofUse</title>
    <text>
    <paragraph>LIPITOR
    hasnotbeenstudiedinconditionswherethemajorlipoproteinabnormalityiselevationofchylomicrons(<content
    styleCode="italics">Fredrickson</content>Types
    IandV).</paragraph>
    </text>
    <effectiveTime
    value="20170630"/></section>
    </component>
    </section>
    </component>
    <component>
    <section
    ID="S2"><id
    root="be8db708-c4d3-4fba-934c-6e372b862de6"/><code
    code="34068-7"

以上文件是从dailymed.nlm.nih.gov取得的整个xml文件的一部分。

0 个答案:

没有答案