如何使用VB中的HtmlAgilityPack解析此数据?

时间:2015-11-07 01:33:19

标签: vb.net

我是解析的新手,我需要从网站获取CSRF令牌以检查用户名是否可用。我知道CSRF令牌存储在前20行左右的网站HTML源代码中。

<head>
<title>Website</title>
<link href="https://fd8c6a1c31abbcfc87c6-9d6bfcdc55882636852ba868a15bca98.ssl.cf5.rackcdn.com/assets/application-afcd9b96896e2ce19d68b2974eb4eb13.css" media="screen" rel="stylesheet">
<meta charset="utf-8">
<meta content="IE=edge" http-equiv="X-UA-Compatible"
<meta content="name check, username, domain, check username" name="keywords">
<meta content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no" name="viewport">
<meta content="yes" name="mobile-web-app-capable">
<meta content="yes" name="apple-mobile-web-app-capable">
<meta content="black" name="apple-mobile-web-app-status-bar-style">
<meta content="Namechk | Username &amp;amp; Domain Availability Search" property="og:title">
<meta content="https://namechk.com/" property="og:url">
<meta content="website" property="og:type">
<meta content="Use Namechk to search for an available username or domain and secure your brand across the internet." property="og:description">
<meta content="https://fd8c6a1c31abbcfc87c6-9d6bfcdc55882636852ba868a15bca98.ssl.cf5.rackcdn.com/assets/logo-full-61eada359058051842c4209ccb16acba.png" property="og:image">
<meta content="en_US" property="og:locale">
<meta content="authenticity_token" name="csrf-param">
<meta content="hVv1hnUD4epiXiojaU2ZjZeRlZfYmoY8Dm6d/h0X3fI=" name="csrf-token">
<link href="https://use.fonticons.com/kits/4e70153b/4e70153b.css" media="all" rel="stylesheet">
<link href="https://use.fonticons.com/kits/48e45036/48e45036.css" media="all" rel="stylesheet">
<script type="text/javascript" src="https://wd-edge.sharethis.com/button/getAllAppDefault.esi?cb=stLight.allDefault&amp;app=all&amp;publisher=8e46a0ce-9473-4683-b2db-c97461495d29&amp;domain=namechk.com"></script>
<style>
    .adsbygoogle,
    .top-ad {
        display: none !important;
    }
</style>
<link rel="stylesheet" type="text/css" href="//sd.sharethis.com/disc/css/hoverbuttons.6eab8de2ee93b309873157b6d3f977fe.css">
<script type="text/javascript" src="//sd.sharethis.com/disc/js/hoverbuttons.035267d71d894482eb413e5bea488ff5.js"></script>
<link rel="stylesheet" type="text/css" href="https://ws.sharethis.com/button/css/buttons-secure.css">
<script type="text/javascript" src="https://ssl.google-analytics.com/ga.js"></script>

我需要解析的是CSRF令牌,在上面的片段中,它是“hVv1hnUD4epiXiojaU2ZjZeRlZfYYYY8Dm6d / h0X3fI =”。我想使用HTMLAgilityPack库来完成这项工作。

1 个答案:

答案 0 :(得分:0)

我们假设HTML文件存储在您的驱动器中。首先我们加载HTML文件。

Dim doc = New HtmlDocument()
doc.Load("HTMLPage1.htm") ' assume it's in the executable folder

然后,您可以使用Linq to XML查询此HTML文件。后代(“meta”)表示获取名称为meta的所有节点。然后检查节点是否具有name属性。如果它具有name属性,请检查其值是否为csrf-token

Dim node = doc. _
    DocumentNode. _
    Descendants("meta"). _
    FirstOrDefault(Function(x)
                       Return _
                           x.Attributes.Contains("name") _
                           AndAlso x.Attributes("name").Value = "csrf-token"
                   End Function)

然后,您可以获取该节点中content属性的值。我使用控制台应用程序,所以我只是将它打印到屏幕上。

    If Not node Is Nothing Then
        Console.WriteLine(node.Attributes("content").Value)
    Else
        Console.WriteLine("Not found!")
    End If

完整的源代码。

Imports HtmlAgilityPack

Module Module1

    Sub Main()

        ' load the html
        Dim doc = New HtmlDocument()
        doc.Load("HTMLPage1.htm")

        ' query the html
        Dim node = doc. _
            DocumentNode. _
            Descendants("meta"). _
            FirstOrDefault(Function(x)
                               Return _
                                   x.Attributes.Contains("name") _
                                   AndAlso x.Attributes("name").Value = "csrf-token"
                           End Function)

        ' print result
        If Not node Is Nothing Then
            Console.WriteLine(node.Attributes("content").Value)
        Else
            Console.WriteLine("Not found!")
        End If

        Console.ReadKey(True)

    End Sub

End Module

如果HTML文件在线,则应实例化HtmlWeb类。然后使用它从服务器加载HTML文件。

Dim web = New HtmlWeb()
doc = web.Load("www.somewebsite.com/somefile.html")