正则表达式中的随机错误

时间:2013-08-09 06:09:05

标签: c# html regex

我使用下面的代码搜索网页中的数据并将数据返回到da datagridview。

当我将它用于包含许多行(例如100行)的网页时,有时会返回错误的行 像这样: CaucaiaCE

并且应该只是Caucaia

为什么它只发生在100行中的2行?

这是我正在搜索的HTML http://pastie.org/8220836

{
    int i = 0;
    Match matchLogradouro = Regex.Match(pagina, "<td width=\"268\" style=\"padding: 2px\">(.*)</td>");
    Match matchBairroCidade = Regex.Match(pagina, "<td width=\"140\" style=\"padding: 2px\">(.*)</td>");
    Match matchEstado = Regex.Match(pagina, "<td width=\"25\" style=\"padding: 2px\">([A-Z]{2})</td>");
    Match matchCep = Regex.Match(pagina, "<td width=\"65\" style=\"padding: 2px\">(.*)</td>");
    int z = Regex.Matches(pagina, "detalharCep").Count;
    while (z > i -1)
    {    
        dataGridView1.Rows.Add(matchLogradouro.Groups[1].Value);
        matchLogradouro = matchLogradouro.NextMatch();
        dataGridView1.Rows[i].Cells[1].Value = matchBairroCidade.Groups[1].Value;
        matchBairroCidade = matchBairroCidade.NextMatch();
        dataGridView1.Rows[i].Cells[2].Value = matchBairroCidade.Groups[1].Value;
        matchBairroCidade = matchBairroCidade.NextMatch();
        dataGridView1.Rows[i].Cells[3].Value = matchEstado.Groups[1].Value;
        matchEstado = matchEstado.NextMatch();

        dataGridView1.Rows[i].Cells[4].Value = matchCep.Groups[1].Value;
        matchCep = matchCep.NextMatch();
        i++;
    }
}

1 个答案:

答案 0 :(得分:7)

创建类(对不起,我不懂葡萄牙语来了解你班上应该有哪些类型的数据)

public class Foo // I believe it should be something like Address
{
    public string Logradouro { get; set; }
    public string BairroCidade1 { get; set; }
    public string BairroCidade2 { get; set; }
    public string Estado { get; set; } // this should be State
    public string Cep { get; set; }
}

并使用HtmlAgilityPack来解析您的HTML文档

HtmlDocument doc = new HtmlDocument();
doc.Load(html_file_name); // or doc.LoadHtml(html_string)

var foos = from row in doc.DocumentNode.SelectNodes("//tr[td]")
           let cells = row.SelectNodes("td").Select(td => td.InnerText).ToArray()
           where cells.Length > 4
           select new Foo {
               Logradouro = cells[0],
               BairroCidade1 = cells[1],
               BairroCidade2 = cells[2],
               Estado = cells[3],
               Cep = cells[4]
           };