获取文件

时间:2018-06-14 15:59:01

标签: java xml oop parsing file-handling

我有一个如下所示的文件 

<title>1 INDICATIONS AND USAGE</title>
    <text>
        <paragraph>Therapy with lipid-altering agents should be only one component of multiple risk factor intervention in individuals at significantly increased risk for atherosclerotic vascular disease due to hypercholesterolemia. Drug therapy is recommended as an adjunct to diet when the response to a diet restricted in saturated fat and cholesterol and other nonpharmacologic measures alone has been inadequate. In patients with CHD or multiple risk factors for CHD, LIPITOR can be started simultaneously with diet.</paragraph>
    </text>
    <effectiveTime value="20170630"/>
    <excerpt>
        <highlight>
            <text>
                <paragraph>LIPITOR is an HMG-CoA reductase inhibitor indicated as an adjunct therapy to diet to:</paragraph>
                <list listType="unordered" styleCode="disc"> <item>Reduce the risk of MI, stroke, revascularization procedures, and angina in adult patients without CHD, but with multiple risk factors (<linkHtml href="#S1.1">1.1</linkHtml>).</item>
                    <item>Reduce the risk of MI and stroke in adult patients with type 2 diabetes without CHD, but with multiple risk factors (<linkHtml href="#S1.1">1.1</linkHtml>).</item>
                    <item>Reduce the risk of non-fatal MI, fatal and non-fatal stroke, revascularization procedures, hospitalization for CHF, and angina in adult patients with CHD (<linkHtml href="#S1.1">1.1</linkHtml>).</item>
                    <item>Reduce elevated total-C, LDL-C, apo B, and TG levels and increase HDL-C in adult patients with primary hyperlipidemia (heterozygous familial and nonfamilial) and mixed dyslipidemia (<linkHtml href="#S1.2">1.2</linkHtml>).</item>
                    <item> Reduce elevated TG in adult patients with hypertriglyceridemia and primary dysbetalipoproteinemia (<linkHtml href="#S1.2">1.2</linkHtml>).</item>
                    <item>Reduce total-C and LDL-C in patients with homozygous familial hypercholesterolemia (HoFH) (<linkHtml href="#S1.2">1.2</linkHtml>).</item>
                    <item>Reduce elevated total-C, LDL-C, and apo B levels in pediatric patients, 10 years to 17 years of age, with heterozygous familial hypercholesterolemia (HeFH) after failing an adequate trial of diet therapy (<linkHtml href="#S1.2">1.2</linkHtml>).</item>
                </list>
                <paragraph><content styleCode="underline">Limitations of Use</content>:</paragraph>
                <paragraph>LIPITOR has not been studied in <content styleCode="italics">Fredrickson </content>Types I and V dyslipidemias (<linkHtml href="#S1.3">1.3</linkHtml>).</paragraph>
            </text>
        </highlight>
    </excerpt>
    <component>
        <section ID="S1.1">
            <id root="3a10e3ca-e81c-43c9-9262-e15f334eedfc"/>
            <code code="42229-5" codeSystem="2.16.840.1.113883.6.1" displayName="SPL UNCLASSIFIED SECTION"/>
            <title>1.1 Prevention of Cardiovascular Disease in Adults</title>
            <text>
                <paragraph>In adult patients without clinically evident coronary heart disease, but with multiple risk factors for coronary heart disease such as age, smoking, hypertension, low HDL-C, or a family history of early coronary heart disease, LIPITOR is indicated to:</paragraph>
                <list listType="unordered" styleCode="disc">
                    <item>Reduce the risk of myocardial infarction</item>
                    <item>Reduce the risk of stroke</item>
                    <item>Reduce the risk for revascularization procedures and angina</item>
                </list>
                <paragraph>In adult patients with type 2 diabetes, and without clinically evident coronary heart disease, but with multiple risk factors for coronary heart disease such as retinopathy, albuminuria, smoking, or hypertension, LIPITOR is indicated to:</paragraph>
                <list listType="unordered" styleCode="disc"> <item>Reduce the risk of myocardial infarction</item>
                    <item>Reduce the risk of stroke</item>
                </list>
                <paragraph>In adult patients with clinically evident coronary heart disease, LIPITOR is indicated to:</paragraph>
                    <list listType="unordered" styleCode="disc">
                        <item>Reduce the risk of non-fatal myocardial infarction</item>
                        <item>Reduce the risk of fatal and non-fatal stroke</item>
                        <item>Reduce the risk for revascularization procedures</item>
                        <item>Reduce the risk of hospitalization for CHF</item>
                        <item>Reduce the risk of angina</item>
                    </list>
                </text>
            <effectiveTime value="20170630"/>
        </section>
    </component>
    <component> 
        <section ID="S1.2">
            <id root="d9003937-e6d4-453b-a767-40f6077a351a"/>
            <code code="42229-5" codeSystem="2.16.840.1.113883.6.1" displayName="SPL UNCLASSIFIED SECTION"/>
            <title>1.2 Hyperlipidemia</title>
            <text>
                <paragraph>LIPITOR is indicated:</paragraph>
                <list listType="unordered" styleCode="square"> <item>As an adjunct to diet to reduce elevated total-C, LDL-C, apo B, and TG levels and to increase HDL-C in adult patients with primary hypercholesterolemia (heterozygous familial and nonfamilial) and mixed dyslipidemia (<content styleCode="italics">Fredrickson </content>Types IIa and IIb);</item>
                    <item>As an adjunct to diet for the treatment of adult patients with elevated serum TG levels (<content styleCode="italics">Fredrickson </content>Type IV);</item>
                    <item>For the treatment of adult patients with primary dysbetalipoproteinemia (<content styleCode="italics">Fredrickson </content>Type III) who do not respond adequately to diet;</item>
                    <item>To reduce total-C and LDL-C in patients with homozygous familial hypercholesterolemia (HoFH) as an adjunct to other lipid-lowering treatments (e.g., LDL apheresis) or if such treatments are unavailable;</item>
                    <item>As an adjunct to diet to reduce total-C, LDL-C, and apo B levels in pediatric patients, 10 years to 17 years of age, with heterozygous familial hypercholesterolemia (HeFH) if after an adequate trial of diet therapy the following findings are present:<list listType="ordered" styleCode="LittleAlpha"> <item>LDL-C remains ≥ 190 mg/dL or</item> <item>LDL-C remains ≥ 160 mg/dL and:<list listType="unordered" styleCode="disc"> <item>there is a positive family history of premature cardiovascular disease or</item> <item>two or more other CVD risk factors are present in the pediatric patient</item> </list> </item> </list> </item>
                </list>
            </text>
            <effectiveTime value="20170630"/>
        </section>
    </component>
    <component>
        <section ID="S1.3">
            <id root="b9d715d4-0a9e-4ada-9fd8-574fd627290a"/>
            <code code="42229-5" codeSystem="2.16.840.1.113883.6.1" displayName="SPL UNCLASSIFIED SECTION"/>
            <title>1.3 Limitations of Use</title>
                <text>
                    <paragraph>LIPITOR has not been studied in conditions where the major lipoprotein abnormality is elevation of chylomicrons (<content styleCode="italics">Fredrickson </content>Types I and V).</paragraph>
                </text>
                <effectiveTime value="20170630"/>
            </section>
        </component>
    </section>
</component>
<component> 
    <section ID="S2">
        <id root="be8db708-c4d3-4fba-934c-6e372b862de6"/>
        <code code="34068-7" 

我想解析文件并获取标签之间的所有文本。

所以我写了一段代码,上面写着&#39;&gt;&#39;之间的子串。和&#39;

如何获取整个文件中的所有标签?

代码

try {
    File file = new File("A:/OneDrive - PharmaCompany, Inc/Diksha Work/usage1.txt");
    FileReader fileReader = new FileReader(file);
    BufferedReader bufferedReader = new BufferedReader(fileReader);
    StringBuffer stringBuffer = new StringBuffer();
    String line,l;
    while ((line = bufferedReader.readLine()) != null) {                                

        int start =(line.indexOf(">"));
        int end=(line.indexOf("</"));                       
        String name = line.substring(start+1,end);
        System.out.println(name);
    }
    fileReader.close();
} catch (IOException e) {
    e.printStackTrace();
}

我只获得前两个标签之间的文字文字:

实际输出

  

1指示和用法

预期输出

  

1适应症和用法用脂质改变剂治疗应该是   只有个人的多重风险因素干预的一个组成部分   显着增加动脉粥样硬化血管疾病的风险   由于高胆固醇血症。推荐药物治疗作为辅助手段   当饮食的反应限制在饱和脂肪和   单独使用胆固醇和其他非药物措施   不足。冠心病患者或冠心病的多种危险因素,   LIPITOR可以与饮食同时开始。 LIPITOR是HMG-CoA   还原酶抑制剂表明作为饮食的辅助疗法:   ...(等标签之间的所有文本直到文件末尾)

2 个答案:

答案 0 :(得分:1)

为了提取&gt;之间的所有非零长度文本。和&lt;文本文件中的字符,您可以使用正则表达式&#34;&gt;([^&gt; | ^&lt;] {1,}}&lt;&#34;。

        String text= "fljjl;<xmlelement>value</xmlelement> <jjjj> <tag>kkk< ><'l'l'";
        final Matcher matcher = Pattern.compile(">([^>|^<]{1,})<").matcher(text);
        while(matcher.find()) {
            System.out.println(matcher.group(1));
        }

例如,上面的代码将提取&#34;值&#34;和&#34; kkk&#34;来自文本字符串。

答案 1 :(得分:0)

也许这就是你的讽刺:

String [] String.split(String regex)

你可以在两个元素上加一个元素,例如:

String a = "aatotoaabbisinthebbaakitchenaa";
String [] output = a.split("aa");

输出将包含:[&#34;&#34;,&#34; toto&#34;,&#34; bbisinthebb&#34;,&#34; kitchen&#34;] 所以围绕着#34; aa&#34;在数组中处于不稳定的位置。 (toto =&gt; 1和kitchen =&gt; 3)

如果您的目标是解析xml,请不要这样做!有一些库像JAXB那样做。 Eclipse提供了从xml的xsd生成类的所有工具,这将更容易使用。

修改:

在你的cas中你不会搜索&#34; aa但是有些匹配这种模式:

(?i)(<(/?)title.*?>)(.+?)()

所以就像juste取代&#34; title&#34;你正在以模式搜索的应答器