我使用下面的代码搜索网页中的数据并将数据返回到da datagridview。
当我将它用于包含许多行(例如100行)的网页时,有时会返回错误的行 像这样: CaucaiaCE
并且应该只是Caucaia
为什么它只发生在100行中的2行?
这是我正在搜索的HTML http://pastie.org/8220836
{
int i = 0;
Match matchLogradouro = Regex.Match(pagina, "<td width=\"268\" style=\"padding: 2px\">(.*)</td>");
Match matchBairroCidade = Regex.Match(pagina, "<td width=\"140\" style=\"padding: 2px\">(.*)</td>");
Match matchEstado = Regex.Match(pagina, "<td width=\"25\" style=\"padding: 2px\">([A-Z]{2})</td>");
Match matchCep = Regex.Match(pagina, "<td width=\"65\" style=\"padding: 2px\">(.*)</td>");
int z = Regex.Matches(pagina, "detalharCep").Count;
while (z > i -1)
{
dataGridView1.Rows.Add(matchLogradouro.Groups[1].Value);
matchLogradouro = matchLogradouro.NextMatch();
dataGridView1.Rows[i].Cells[1].Value = matchBairroCidade.Groups[1].Value;
matchBairroCidade = matchBairroCidade.NextMatch();
dataGridView1.Rows[i].Cells[2].Value = matchBairroCidade.Groups[1].Value;
matchBairroCidade = matchBairroCidade.NextMatch();
dataGridView1.Rows[i].Cells[3].Value = matchEstado.Groups[1].Value;
matchEstado = matchEstado.NextMatch();
dataGridView1.Rows[i].Cells[4].Value = matchCep.Groups[1].Value;
matchCep = matchCep.NextMatch();
i++;
}
}
答案 0 :(得分:7)
创建类(对不起,我不懂葡萄牙语来了解你班上应该有哪些类型的数据)
public class Foo // I believe it should be something like Address
{
public string Logradouro { get; set; }
public string BairroCidade1 { get; set; }
public string BairroCidade2 { get; set; }
public string Estado { get; set; } // this should be State
public string Cep { get; set; }
}
并使用HtmlAgilityPack来解析您的HTML文档
HtmlDocument doc = new HtmlDocument();
doc.Load(html_file_name); // or doc.LoadHtml(html_string)
var foos = from row in doc.DocumentNode.SelectNodes("//tr[td]")
let cells = row.SelectNodes("td").Select(td => td.InnerText).ToArray()
where cells.Length > 4
select new Foo {
Logradouro = cells[0],
BairroCidade1 = cells[1],
BairroCidade2 = cells[2],
Estado = cells[3],
Cep = cells[4]
};