从文本文件中读取数据,文本文件写得不好

时间:2012-01-25 11:37:53

标签: java file parsing variables text

我正在编写一个从文本文件中的行中获取数据的程序。问题是它不是最好的书面文本文件,并且在尝试为文件编写解析器时存在很多混淆

这里有两个这样的行,我可以得到地址和纬度和经度变量,但在第二个我无法得到价格或尺寸。我一直得到的错误是超出-41(严重)

的字符串超出范围
|12091805|,|0|,|DETAILS|,||,||,|Latitude:54.593406, Longitude:-5.934344 <b >Unit 8 Great Northern Mall Great Victoria Street Belfast Down<//b><p><p><p>Price : 150,000<p>Size: 2,411 Sq Feet  ()<p>Rent : 50,500 Per Annum<p><p>Text<p><p>|,||,||

|15961081|,|0|,|DETAILS|,||,||,|<p>Latitude:54.593406, Longitude:-5.934344   <b>3-5 Market Street Lurgan BT66</b> </p>  <p> </p>  <p> </p>  <p>   Price : &pound;250,000 </p>  <p>   Size: 0.173 acres (0.07ha) </p>  <p> </p>  <p>   Text </p>  <p> </p>  <p>  Text </p>  <p> </p>  <p>   Text </p>  <p> </p>  <p> </p>|,||,||

它的篇幅要长很多,但我现在更改段落只是为了说文字。

不,我不能重写文本文件。任何指针都将不胜感激

if (s.contains("Price"))
{
    int pstart = 0;
    int pend = 0;

    if (s.contains("<p>Size"))
    {

        //if has pound symbol
        if (s.contains("&pound;"))
        {
            String[] str = s.split("&pound;");
            StringBuilder bs = new StringBuilder();
            for (String st : str)
            {
                bs.append(st);
            }

            pstart = bs.indexOf("Price") + 8;
            pend = bs.indexOf("</p>") - 1;
        }
        else
        {
            pstart = s.indexOf("Price") + 8;
            pend = s.indexOf("<p>Size");
        }

        String sp = s.substring(pstart, pend);

        String[] spl = sp.split(",");
        StringBuilder build = new StringBuilder();
        for (String st : spl)
        {
            build.append(st);
            f = build.toString();
        }
        in = Integer.parseInt(f);
        p.setPrice(in);
    }
    else
    {
        if (s.contains("&pound;"))
        {
            String[] str = s.split("&pound;");
            StringBuilder bs = new StringBuilder();
            for (String st : str)
            {
                bs.append(st);
            }

            pstart = bs.indexOf("Price : ");
            pend = bs.indexOf("</p>") - 1;
        }
        else
        {
            pstart = s.indexOf("Price") + 8;
            pend = s.indexOf("<p>Size");
        }

        String sp = s.substring(pstart, pend);

        String[] spl = sp.split(",");
        StringBuilder build = new StringBuilder();
        for (String st : spl)
        {
            build.append(st);
            f = build.toString();
        }
        in = Integer.parseInt(f);
        p.setPrice(in);
    }
}

// if has size property
if (s.contains("Size"))
{
    //if in acres
    if (s.contains("acres"))
    {
        int sstart = s.indexOf("Size:") + 6;
        int send = s.indexOf("acres") - 1;

        String sp = s.substring(sstart, send);
        double d = Double.parseDouble(sp);

        p.setSized(d);

    }

    if (s.contains("()"))
    {
        int sstart = s.indexOf("Size:") + 6;

        int send = s.indexOf("Sq") - 2;

        String sp = s.substring(sstart, send);

        if (sp.contains("-") && sp.contains(","))
        {
            String[] spl = sp.split("-|,");

            StringBuilder str = new StringBuilder();
            str.append(spl[0] + spl[1]);

            StringBuilder str2 = new StringBuilder(0);
            str2.append(spl[2] + spl[3]);

            String s1 = str.toString();
            int i = Integer.parseInt(s1);
            p.setSize(i);

            String s2 = str2.toString();
            i = Integer.parseInt(s2);
            p.setSize2(i);
        }

        if (sp.contains("-"))
        {
            String[] spl = sp.split("-");

            int one = Integer.parseInt(spl[0]);

            p.setSize(one);

            int two = Integer.parseInt(spl[1]);

            p.setSize2(two);

        }
        else if (!(sp.contains("-")))
        {
            if (sp.contains(","))
            {
                String[] spl = sp.split(",");
                StringBuilder build = new StringBuilder();
                for (String st : spl)
                {
                    build.append(st);
                    f = build.toString();
                }
                in = Integer.parseInt(f);
                p.setSize(in);
            }
            else
            {
                p.setSize(Integer.parseInt(sp));
            }

        }

    }

}
v.add(p);
p = new Property();

2 个答案:

答案 0 :(得分:1)

我会使用正则表达式,以下内容应指向正确的方向:

Pattern pricePattern = Pattern.compile("Price\\s*:\\s*(&pound;)?([0-9,.]+)"); 
Pattern sqFeetPattern = Pattern.compile("Size\\s*:\\s*([0-9,.]+)\\s*Sq"); 
Pattern acresPattern = Pattern.compile("Size\\s*:\\s*([0-9,.]+)\\s*acres\\s*\\(([0-9,.]+)ha\\)"); 

NumberFormat nf = NumberFormat.getNumberInstance();
nf.setGroupingUsed(true);

BufferedReader r = new BufferedReader(inputFileReader);
String line;
while ((line = r.readLine()) != null) {
    Matcher m = pricePattern.matcher(line);
    if (m.find()) {
        int price = nf.parse(m.group(2)).intValue();
        System.out.println("Price: " + price);
    }
    m = sqFeetPattern.matcher(line);
    if (m.find()) {
        int sqFeet = nf.parse(m.group(1)).intValue();
        System.out.println("Sq Feet: " + sqFeet);
    }
    m = acresPattern.matcher(line);
    if (m.find()) {
        float acres = nf.parse(m.group(1)).floatValue();
        float ha = nf.parse(m.group(2)).floatValue();
        System.out.println("Acres: " + acres + " ha: " + ha);
    }
}

N.B。 inputFileReader将被定义为FileReader或类似于获取您的文件。

答案 1 :(得分:0)

我将采取的方法是。

  1. 阅读文字行
  2. 解码文本行 - 看起来像HTML标记,因此将转义字符(例如&pound;)转换为等效文本字符并过滤掉HTML标记(<p>等)
  3. 使用正则表达式
  4. 执行已清理数据的数据提取
  5. 处理数据
  6. 下一行或结束。
  7. 对于第2步,我正在考虑这样的事情。因此,在将字符串拆分为字段分隔符之前,将所有html标记从字符串中删除(|)

    Remove HTML tags from a String