如何从List <string>中删除换行符

时间:2015-08-20 03:19:15

标签: java regex xml data-structures stax

我有方法从XML文件返回一个Map。我已将该地图转换为将键和值分隔为列表。

但是我注意到值列表中有换行符。如何剥离换行符并用空格替换它们或将它们留空。

代码:

@Test
public void testGetXMLModelData() throws Exception {
    File f = new File("xmlDir/example.xml");
    Model m = getXMLModelData(f);

    logger.debug("Models Keys: "+m.getInputs());
    logger.debug("Models Values: "+m.getValues());
}

public Model getXMLModelData(File f) throws Exception { 

    Model model = new Model();

    Map<String,String> map = p(f);
    List<String> listKeys = new ArrayList<String>(map.keySet());
    List<String> listValues = new ArrayList<String>(map.values());

    model.setInputs(listKeys);
    model.setValues(listValues); 

    return model;
}


public Map<String, String> p(File file) throws Exception {

    Map<String, String> map = new HashMap<String,String>();
    XMLStreamReader xr = XMLInputFactory.newInstance().createXMLStreamReader(new FileInputStream(file));

    while(xr.hasNext()) {

        int e = xr.next();
        if (e == XMLStreamReader.START_ELEMENT) {
            String name = xr.getLocalName();
            xr.next();
            String value = null;
            try {
                value = xr.getText();
            } catch (IllegalStateException exep) {
                exep.printStackTrace();
            }
            map.put(name, value);
        } 
    }
    return map;
}

输出:

2015-08-19 20:13:52,327 : Models Keys: [IRS1095A, MonthlyPlanPremiumAmtPP, WagesSalariesAndTipsAmt, MonthlyAdvancedPTCAmtPP, MonthCdPP, ReturnData, IndividualReturnFilingStatusCd, PrimaryResidentStatesInfoGrpPP, MonthlyPTCInformationGrpPP, IRS1040, ResidentStateInfoPP, SelfSelectPINGrp, MonthlyPremiumSLCSPAmtPP, Filer, ResidentStateAbbreviationCdPP, PrimaryBirthDt, Return, ReturnHeader, TotalExemptionsCnt, AdjustedGrossIncomeAmt, PrimarySSN]
2015-08-19 20:13:52,328 : Models Values: [
      , 136, 22000, 125, SEPTEMBER, 
    , 1, 
        , 
        , 
      , 
          , 
      , 250, 
      , CA, 1970-01-01, 
  , 
    , 1, 22000, 555-11-2222]

非常感谢任何帮助或帮助。提前致谢

编辑:

XML文件

<Return xmlns="http://www.irs.gov/efile">
  <ReturnData>
    <IRS1095A uuid="a77f40a2-af31-4404-a27d-4c1eaad730c2">
      <MonthlyPTCInformationGrpPP uuid="69dc9dd5-5415-4ee4-a199-19b2dbb701be">
        <MonthlyPlanPremiumAmtPP>136</MonthlyPlanPremiumAmtPP>
        <MonthlyAdvancedPTCAmtPP>125</MonthlyAdvancedPTCAmtPP>
        <MonthCdPP>SEPTEMBER</MonthCdPP>
        <MonthlyPremiumSLCSPAmtPP>250</MonthlyPremiumSLCSPAmtPP>
      </MonthlyPTCInformationGrpPP>
    </IRS1095A>
    <IRS1040>
      <IndividualReturnFilingStatusCd>1</IndividualReturnFilingStatusCd>
      <WagesSalariesAndTipsAmt>22000</WagesSalariesAndTipsAmt>
      <TotalExemptionsCnt>1</TotalExemptionsCnt>
      <AdjustedGrossIncomeAmt>22000</AdjustedGrossIncomeAmt>
    </IRS1040>
  </ReturnData>
  <ReturnHeader>
    <SelfSelectPINGrp>
      <PrimaryBirthDt>1970-01-01</PrimaryBirthDt>
    </SelfSelectPINGrp>
    <Filer>
      <PrimarySSN>555-11-2222</PrimarySSN>
      <PrimaryResidentStatesInfoGrpPP>
        <ResidentStateInfoPP uuid="a77f40a2-af31-4404-a27d-4c1eaad730c2">
          <ResidentStateAbbreviationCdPP>CA</ResidentStateAbbreviationCdPP>
        </ResidentStateInfoPP>
      </PrimaryResidentStatesInfoGrpPP>
    </Filer>
  </ReturnHeader>
</Return>

2 个答案:

答案 0 :(得分:2)

设置value = xr.getText().trim()。这将从值的开头和结尾修剪无关的字符。

要阻止添加值,请使用map.put(name, value)

打包if (value != null && !value.isEmpty())

答案 1 :(得分:1)

您的代码正在提取元素名称和紧跟在start元素后面的文本,忽略end元素后面的任何文本。

所以,它收集:

Return = <newline><space><space>
ReturnData = <newline><space><space><space><space>
IRS1095A = <newline><space><space><space><space><space><space>
MonthlyPTCInformationGrpPP = <newline><space><space><space><space><space><space><space><space>
MonthlyPlanPremiumAmtPP = 136
...

然后你将它们添加到HashMap中,它以随机顺序对键/值对进行混洗,这使得很难看出发生了什么。

<强>更新

我不会为你编写代码,但如果你想要“价值元素”,那么你需要:

  1. 请记住启动元素
  2. 收集任何文本,与已收集的其他文本连接,例如当您看到&lt; text&gt;&lt; cdata&gt;&lt; text&gt;
  3. 当看到start元素和start元素被记住时,验证文本是空的还是所有空格,然后丢弃文本
  4. 看到结束元素时:
    1. 如果记住了start元素,则将elementName / text添加到result,然后忘记start元素并丢弃文本。注意:如果同一元素名称可能出现多次,请不要使用map。
    2. 如果不记住start元素(是forgotton),请验证文本为空或所有空格,然后丢弃文本
  5. 这将只收集叶元素,忽略任何“布局”。

    代码完全如上所述

    好吧,我确实添加了缺少的资源清理。

    Map<String, String> map = new HashMap<>();
    try (FileInputStream in = new FileInputStream(file)) {
        XMLStreamReader xr = XMLInputFactory.newInstance().createXMLStreamReader(in);
        try (
            String elementName = null;
            StringBuilder textBuf = new StringBuilder();
            while (xr.hasNext()) {
                switch (xr.next()) {
                    case XMLStreamConstants.START_ELEMENT:
                        // 3. When seeing a start element and a start element is remembered
                        if (elementName != null) {
                            // verify text is empty or all whitespace
                            if (! textBuf.toString().trim().isEmpty())
                                throw new IllegalArgumentException("Found text mixed with elements");
                            // then discard text
                            textBuf.setLength(0);
                        }
                        // 1. Remember start element when seen
                        elementName = xr.getLocalName();
                        break;
                    case XMLStreamConstants.CHARACTERS:
                    case XMLStreamConstants.CDATA:
                    case XMLStreamConstants.SPACE:
                        // 2. Collect any text
                        textBuf.append(xr.getText());
                        break;
                    case XMLStreamConstants.END_ELEMENT: // 4. When seeing an end element
                        if (elementName != null) { // 1. if start element is remembered
                            // add elementName/text to result
                            map.put(elementName, textBuf.toString());
                            // then forget start element
                            elementName = null;
                            // and discard text
                            textBuf.setLength(0);
                        } else { // 2. if start element is not remembered (was forgotton)
                            // verify text is empty or all whitespace
                            if (! textBuf.toString().trim().isEmpty())
                                throw new IllegalArgumentException("Found text mixed with elements");
                            // then discard text
                            textBuf.setLength(0);
                        }
                        break;
                    default:
                        // ignore
                } 
            }
        } finally {
            xr.close();
        }
    }
    return map;