VB.NET抓取文本

时间:2011-08-14 05:48:10

标签: html vb.net

好的,这就是HTML

    <li class="comment "
      data-author-viewing="False"
      data-id="FqTXOQTcyYaaGaT51z1St1pYFZW5ycutLsrLpoFIJow"
      data-score="0"
      data-author="pervychika666"> 

      <div class="comment-body"> 

  <div class="content-container"> 
    <div class="content"> 
      <div class="author "> 
        <a href="/user/pervychika666" title="pervychika666">pervychika666</a> 
      </div> 


        <div class="comment-text" dir="ltr"> 
          <p>Look at the female audience, they were all giggling and excited, a lot of women got excited with gay erotica, I admit I&#39;m one of them.</p> 

        </div> 
    </div> 


    <div class="metadata"> 
<span class="comment-actions comment-extra-actions"><a class="comment-action" data-action="flag">Flag</a><span class="comment-action-block"><span class="comment-metadata-separator">&bull;</span><a class="comment-action" data-action="block">Block User</a></span><span class="comment-action-unblock"><span class="comment-metadata-separator">&bull;</span><a class="comment-action" data-action="unblock">Unblock User</a></span><span class="comment-action-remove"><span class="comment-metadata-separator">&bull;</span><a class="comment-action" data-action="remove">Remove</a></span></span> 
      <span class="time"> 
        2 days ago
      </span> 

<span class="comment-actions"><span class="comment-action-vote-up"><a class="comment-action" data-action="vote-up">Like</a><span class="comment-metadata-separator">&bull;</span></span><span class="comment-action-vote-down"><a class="comment-action" data-action="vote-down">Dislike</a><span class="comment-metadata-separator">&bull;</span></span><a class="comment-action-reply comment-action" data-action="reply">Reply</a></span> 
    </div> 
  </div> 

      </div> 
  </li> 




  <li class="comment "
      data-author-viewing="False"
      data-id="FqTXOQTcyYagrOGji01HrGJn0tzIJeY4w1rxok5jrp0"
      data-score="0"
      data-author="mykellluvs"> 

      <div class="comment-body"> 

  <div class="content-container"> 
    <div class="content"> 
      <div class="author "> 
        <a href="/user/mykellluvs" title="mykellluvs">mykellluvs</a> 
      </div> 


        <div class="comment-text" dir="ltr"> 
          <p>I love their faces when they pull away from the kiss ;D</p> 

        </div> 
    </div> 

现在就像data-author =“mykellluvs”&gt;我希望它获取名称mykellluvs,但为所有这些都做了,因为页面上有多个data-author =“东西并将其粘贴到文本框中

我该怎么做?

1 个答案:

答案 0 :(得分:0)

您是在生成html还是尝试从其他地方解析html? 如果您自己生成此HTML,则可能需要从数据源中提取此信息。如果你从其他地方获得它,那么xpath可能是你最好的选择 我使用HtmlAgilityPack来解析带有xpath的html,如此

 // Targets a specific node
                HtmlNode someNode = document.DocumentNode.SelectSingleNode("//ul[@class='posts']");
                // If there is no node with that class, someNode will be null
                if (someNode != null)
                {
                    foreach (var item in someNode.Descendants())
                    {
                        if (item.Attributes["data-author"] != null)
                        {
                            myval = item.Attributes["data-author"];
                           // You can now check the value in myval to see if its what you want, 
                        }                            
                    }
              }

抱歉代码在c#中,但很容易转换为vb.net。您可以在HtmlAgilityPack的文档中找到更多信息。