我想抓取所有网址的页面并将它们放入字典中。我用字典创建了一个类。但我似乎无法在其中添加元素。
type crawler =
new()= {}
member this.urls = new Dictionary<string,string>()
member this.start (url : string)=
let hw = new HtmlWeb()
let doc = hw.Load(url)
let docNode = doc.DocumentNode
let links = docNode.SelectNodes(".//a")
for aLink in links do
let href = aLink.GetAttributeValue("href"," ")
if href.StartsWith("http://") && href.EndsWith(".html") then
this.urls.Add(href, href)
为什么字典网址为空?
答案 0 :(得分:5)
因为这里的url是在每次调用时返回新字典的属性。
type Crawler() =
let urls = new Dictionary<string,string>()
member this.Urls = urls
member this.Start (url : string)=
let hw = new HtmlWeb()
let doc = hw.Load(url)
let docNode = doc.DocumentNode
let links = docNode.SelectNodes(".//a")
for aLink in links do
let href = aLink.GetAttributeValue("href"," ")
if href.StartsWith("http://") && href.EndsWith(".html") then
urls.Add(href, href)
答案 1 :(得分:3)
这不是你的问题,但如果你有兴趣采用更实用的方法,这是一种方法:
type Crawler =
{ Urls : Set<string> }
[<CompilationRepresentation(CompilationRepresentationFlags.ModuleSuffix)>]
module Crawler =
[<CompiledName("Start")>]
let start crawler (url:string) =
let { Urls = oldUrls } = crawler
let newUrls =
HtmlWeb().Load(url).DocumentNode.SelectNodes(".//a")
|> Seq.cast<HtmlNode>
|> Seq.choose (fun link ->
match link.GetAttributeValue("href"," ") with
| href when href.StartsWith("http://") && href.EndsWith(".html") -> Some href
| _ -> None)
|> Set.ofSeq
|> Set.union oldUrls
{ crawler with Urls = newUrls }
您的数据和行为现已分开。 Crawler
是一个不可变的记录类型。 start
接受Crawler
并返回一个新的,其中包含更新的网址列表。我将Dictionary
替换为Set
,因为键和值相同;消除了未使用的let
绑定,并隐藏了一些模式匹配。这应该在C#中具有相对友好的界面。