我是解析的新手,我需要从网站获取CSRF令牌以检查用户名是否可用。我知道CSRF令牌存储在前20行左右的网站HTML源代码中。
<head>
<title>Website</title>
<link href="https://fd8c6a1c31abbcfc87c6-9d6bfcdc55882636852ba868a15bca98.ssl.cf5.rackcdn.com/assets/application-afcd9b96896e2ce19d68b2974eb4eb13.css" media="screen" rel="stylesheet">
<meta charset="utf-8">
<meta content="IE=edge" http-equiv="X-UA-Compatible"
<meta content="name check, username, domain, check username" name="keywords">
<meta content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no" name="viewport">
<meta content="yes" name="mobile-web-app-capable">
<meta content="yes" name="apple-mobile-web-app-capable">
<meta content="black" name="apple-mobile-web-app-status-bar-style">
<meta content="Namechk | Username &amp; Domain Availability Search" property="og:title">
<meta content="https://namechk.com/" property="og:url">
<meta content="website" property="og:type">
<meta content="Use Namechk to search for an available username or domain and secure your brand across the internet." property="og:description">
<meta content="https://fd8c6a1c31abbcfc87c6-9d6bfcdc55882636852ba868a15bca98.ssl.cf5.rackcdn.com/assets/logo-full-61eada359058051842c4209ccb16acba.png" property="og:image">
<meta content="en_US" property="og:locale">
<meta content="authenticity_token" name="csrf-param">
<meta content="hVv1hnUD4epiXiojaU2ZjZeRlZfYmoY8Dm6d/h0X3fI=" name="csrf-token">
<link href="https://use.fonticons.com/kits/4e70153b/4e70153b.css" media="all" rel="stylesheet">
<link href="https://use.fonticons.com/kits/48e45036/48e45036.css" media="all" rel="stylesheet">
<script type="text/javascript" src="https://wd-edge.sharethis.com/button/getAllAppDefault.esi?cb=stLight.allDefault&app=all&publisher=8e46a0ce-9473-4683-b2db-c97461495d29&domain=namechk.com"></script>
<style>
.adsbygoogle,
.top-ad {
display: none !important;
}
</style>
<link rel="stylesheet" type="text/css" href="//sd.sharethis.com/disc/css/hoverbuttons.6eab8de2ee93b309873157b6d3f977fe.css">
<script type="text/javascript" src="//sd.sharethis.com/disc/js/hoverbuttons.035267d71d894482eb413e5bea488ff5.js"></script>
<link rel="stylesheet" type="text/css" href="https://ws.sharethis.com/button/css/buttons-secure.css">
<script type="text/javascript" src="https://ssl.google-analytics.com/ga.js"></script>
我需要解析的是CSRF令牌,在上面的片段中,它是“hVv1hnUD4epiXiojaU2ZjZeRlZfYYYY8Dm6d / h0X3fI =”。我想使用HTMLAgilityPack库来完成这项工作。
答案 0 :(得分:0)
我们假设HTML文件存储在您的驱动器中。首先我们加载HTML文件。
Dim doc = New HtmlDocument()
doc.Load("HTMLPage1.htm") ' assume it's in the executable folder
然后,您可以使用Linq to XML查询此HTML文件。后代(“meta”)表示获取名称为meta
的所有节点。然后检查节点是否具有name
属性。如果它具有name
属性,请检查其值是否为csrf-token
。
Dim node = doc. _
DocumentNode. _
Descendants("meta"). _
FirstOrDefault(Function(x)
Return _
x.Attributes.Contains("name") _
AndAlso x.Attributes("name").Value = "csrf-token"
End Function)
然后,您可以获取该节点中content
属性的值。我使用控制台应用程序,所以我只是将它打印到屏幕上。
If Not node Is Nothing Then
Console.WriteLine(node.Attributes("content").Value)
Else
Console.WriteLine("Not found!")
End If
完整的源代码。
Imports HtmlAgilityPack
Module Module1
Sub Main()
' load the html
Dim doc = New HtmlDocument()
doc.Load("HTMLPage1.htm")
' query the html
Dim node = doc. _
DocumentNode. _
Descendants("meta"). _
FirstOrDefault(Function(x)
Return _
x.Attributes.Contains("name") _
AndAlso x.Attributes("name").Value = "csrf-token"
End Function)
' print result
If Not node Is Nothing Then
Console.WriteLine(node.Attributes("content").Value)
Else
Console.WriteLine("Not found!")
End If
Console.ReadKey(True)
End Sub
End Module
如果HTML文件在线,则应实例化HtmlWeb类。然后使用它从服务器加载HTML文件。
Dim web = New HtmlWeb()
doc = web.Load("www.somewebsite.com/somefile.html")