Question

我试图解析此HTML文档以获取航班，时间，来源，日期和输出的内容。

<div id="FlightInfo_FlightInfoUpdatePanel">

<table cellspacing="0" cellpadding="0">
<tbody>
    <tr class="">
    <td class="airline"><img src="/images/airline logos/US.gif" title="US AIRWAYS. " alt="US AIRWAYS. " /></td>
    <td class="flight">US5316</td>
    <td class="codeshare">NZ46</td>
    <td class="origin">Rarotonga</td>
    <td class="date">02 Sep</td>
    <td class="time">10:30</td>
    <td class="est">21:30</td>
    <td class="status">CHECK IN CLOSING</td>
    </tr>

我正在使用此代码，基于HTML Agility Pack for windows phone 7查找并输出<td class="flight">US5316</td>的内容

void client_DownloadStringCompleted(object sender, DownloadStringCompletedEventArgs e)
{
    var html = e.Result;

    var doc = new HtmlDocument();
    doc.LoadHtml(html);


    var node = doc.DocumentNode.Descendants("div")
        .FirstOrDefault(x => x.Id == "FlightInfo_FlightInfoUpdatePanel")
        .Element("table")
        .Element("tbody")
        .Elements("tr")
        .Where(tr => tr.GetAttributeValue("td", "").Contains("class"))
        .SelectMany(tr => tr.Descendants("flight"))
        .ToArray();

    this.scrollViewer1.Content = node;  

   //Added below

   listBox1.itemSource = node;
}

我在ScrollViewer或列表框中都没有结果。我想知道我使用的linq解析对于我提供的HTML是否正确？

Answer 1

你打算用这条线做什么？

.Where(tr => tr.GetAttributeValue("td", "").Contains("class"))

GetAttributeValue(name, def)在节点中查找具有键name的属性，并在找到该属性时返回该属性的值。否则，它返回默认值def。

所以这里实际发生的是<tr>没有任何带有td的属性，所以它返回默认值（空字符串），它不包含子字符串“class “，因此过滤掉了<tr>节点。

修改：这将返回一个数组，其中每个条目是一个包含每个td的内容的8个字符串的数组：

var node = doc.DocumentNode.Descendants("div") .FirstOrDefault(x => x.Id == "FlightInfo_FlightInfoUpdatePanel") .Element("table") .Element("tbody") .Elements("tr") .Select(tr => tr.Elements("td").Select(td => td.InnerText).ToArray()) .ToArray();

示例：

node[0][1] == "US5316" node[0][3] == "Rarotonga"

Answer 2

您正在尝试将ScrollViewer的内容设置为string[]（数组）。所以我会重复一遍，并说在继续这项努力之前你应该花一些时间学习基本的C＃。

您需要做的是使用ListBox代替ScrollViewer，然后将ListBox.ItemSource设置为node字符串数组。

HTML Parse没有结果

2 个答案: