如果将代码编译为控制台程序或以 fsi --use:Program.fs --exec --quiet 运行,则某些线程将在完成之前终止。有没有办法等待所有线程结束?
此问题可以描述为“当存在多个MailboxProcessers时程序退出问题”。
输出示例
(注意最后一行被截断,最后一个输出函数(printfn "[Main] after crawl"
)永远不会被执行。)
[Main] before crawl [Crawl] before return result http://news.google.com crawled by agent 1. [supervisor] reached limit Agent 5 is done. http://www.gstatic.com/news/img/favicon.ico crawled by agent 1. [supervisor] reached limit Agent 1 is done. http://www.google.com/imghp?hl=en&tab=ni crawled by agent 4. [supervisor] reached limit Agent 4 is done. http://www.google.com/webhp?hl=en&tab=nw crawled by agent 2. [supervisor] reached limit Agent 2 is done. http://news.google.
码
编辑:添加了多个System.Threading.Thread.CurrentThread.IsBackground <- false
。
open System
open System.Collections.Concurrent
open System.Collections.Generic
open System.IO
open System.Net
open System.Text.RegularExpressions
module Helpers =
type Message =
| Done
| Mailbox of MailboxProcessor<Message>
| Stop
| Url of string option
| Start of AsyncReplyChannel<unit>
// Gates the number of crawling agents.
[<Literal>]
let Gate = 5
// Extracts links from HTML.
let extractLinks html =
let pattern1 = "(?i)href\\s*=\\s*(\"|\')/?((?!#.*|/\B|" +
"mailto:|location\.|javascript:)[^\"\']+)(\"|\')"
let pattern2 = "(?i)^https?"
let links =
[
for x in Regex(pattern1).Matches(html) do
yield x.Groups.[2].Value
]
|> List.filter (fun x -> Regex(pattern2).IsMatch(x))
links
// Fetches a Web page.
let fetch (url : string) =
try
let req = WebRequest.Create(url) :?> HttpWebRequest
req.UserAgent <- "Mozilla/5.0 (Windows; U; MSIE 9.0; Windows NT 9.0; en-US)"
req.Timeout <- 5000
use resp = req.GetResponse()
let content = resp.ContentType
let isHtml = Regex("html").IsMatch(content)
match isHtml with
| true -> use stream = resp.GetResponseStream()
use reader = new StreamReader(stream)
let html = reader.ReadToEnd()
Some html
| false -> None
with
| _ -> None
let collectLinks url =
let html = fetch url
match html with
| Some x -> extractLinks x
| None -> []
open Helpers
// Creates a mailbox that synchronizes printing to the console (so
// that two calls to 'printfn' do not interleave when printing)
let printer =
MailboxProcessor.Start(fun x -> async {
while true do
let! str = x.Receive()
System.Threading.Thread.CurrentThread.IsBackground <- false
printfn "%s" str })
// Hides standard 'printfn' function (formats the string using
// 'kprintf' and then posts the result to the printer agent.
let printfn fmt =
Printf.kprintf printer.Post fmt
let crawl url limit =
// Concurrent queue for saving collected urls.
let q = ConcurrentQueue<string>()
// Holds crawled URLs.
let set = HashSet<string>()
let supervisor =
MailboxProcessor.Start(fun x -> async {
System.Threading.Thread.CurrentThread.IsBackground <- false
// The agent expects to receive 'Start' message first - the message
// carries a reply channel that is used to notify the caller
// when the agent completes crawling.
let! start = x.Receive()
let repl =
match start with
| Start repl -> repl
| _ -> failwith "Expected Start message!"
let rec loop run =
async {
let! msg = x.Receive()
match msg with
| Mailbox(mailbox) ->
let count = set.Count
if count < limit - 1 && run then
let url = q.TryDequeue()
match url with
| true, str -> if not (set.Contains str) then
let set'= set.Add str
mailbox.Post <| Url(Some str)
return! loop run
else
mailbox.Post <| Url None
return! loop run
| _ -> mailbox.Post <| Url None
return! loop run
else
printfn "[supervisor] reached limit"
// Wait for finishing
mailbox.Post Stop
return! loop run
| Stop -> printfn "[Supervisor] stop"; return! loop false
| Start _ -> failwith "Unexpected start message!"
| Url _ -> failwith "Unexpected URL message!"
| Done -> printfn "[Supervisor] Supervisor is done."
(x :> IDisposable).Dispose()
// Notify the caller that the agent has completed
repl.Reply(())
}
do! loop true })
let urlCollector =
MailboxProcessor.Start(fun y ->
let rec loop count =
async {
System.Threading.Thread.CurrentThread.IsBackground <- false
let! msg = y.TryReceive(6000)
match msg with
| Some message ->
match message with
| Url u ->
match u with
| Some url -> q.Enqueue url
return! loop count
| None -> return! loop count
| _ ->
match count with
| Gate -> (y :> IDisposable).Dispose()
printfn "[urlCollector] URL collector is done."
supervisor.Post Done
| _ -> return! loop (count + 1)
| None -> supervisor.Post Stop
return! loop count
}
loop 1)
/// Initializes a crawling agent.
let crawler id =
MailboxProcessor.Start(fun inbox ->
let rec loop() =
async {
System.Threading.Thread.CurrentThread.IsBackground <- false
let! msg = inbox.Receive()
match msg with
| Url x ->
match x with
| Some url ->
let links = collectLinks url
printfn "%s crawled by agent %d." url id
for link in links do
urlCollector.Post <| Url (Some link)
supervisor.Post(Mailbox(inbox))
return! loop()
| None -> supervisor.Post(Mailbox(inbox))
return! loop()
| _ -> printfn "Agent %d is done." id
urlCollector.Post Done
(inbox :> IDisposable).Dispose()
}
loop())
// Send 'Start' message to the main agent. The result
// is asynchronous workflow that will complete when the
// agent crawling completes
let result = supervisor.PostAndAsyncReply(Start)
// Spawn the crawlers.
let crawlers =
[
for i in 1 .. Gate do
yield crawler i
]
// Post the first messages.
crawlers.Head.Post <| Url (Some url)
crawlers.Tail |> List.iter (fun ag -> ag.Post <| Url None)
printfn "[Crawl] before return result"
result
// Example:
printfn "[Main] before crawl"
crawl "http://news.google.com" 5
|> Async.RunSynchronously
printfn "[Main] after crawl"
答案 0 :(得分:3)
如果我正确识别代码,则它基于your previous question(和我的回答)。
程序等待主管代理完成(通过发送Start
消息,然后使用RunSynchronously
等待回复)。这应该保证主代理以及所有爬虫在应用程序退出之前完成。
问题是它不会等到printer
代理完成!因此,对(重新定义的)printfn
函数的最后一次调用会向代理发送一条消息,然后应用程序完成,而不会等到打印代理完成。
据我所知,在代理完成处理当前队列中的所有消息之前,没有“标准模式”等待。您可以尝试的一些想法是:
您可以检查CurrentQueueLength
属性(等到它为0),但这并不意味着代理已完成处理所有消息。
您可以通过添加新类型的消息并等待代理回复该消息来使代理更复杂(就像您当前正在等待对Start
消息的回复一样)。
答案 1 :(得分:0)
警告我知道零F#,但通常你会使用Thread.Join等待所有感兴趣的线程。在我看来,在你的情况下,你需要等待通过致电.Start
而开始的任何兴趣。
您还可以考虑使用任务并行库,它可以为原始托管线程提供更高级别(更简单)的抽象。等待任务完成的示例here。
答案 2 :(得分:0)
.NET线程具有属性Thread.IsBackground,当此设置为true时,线程不会阻止进程退出。设置为false时,将阻止进程退出。请参阅:http://msdn.microsoft.com/en-us/library/system.threading.thread.isbackground.aspx
运行代理的线程来自线程池,因此默认情况下将Thread.IsBackground设置为false。
每次阅读邮件时,您都可以尝试将线程的IsBackground设置为false。您可以添加一个函数来为您执行此操作以使方法更清晰。每次使用let时,它可能不是解决问题的最佳方案!您可以更改线程,因此需要仔细实施才能正常工作。我只是想提一下回答具体问题
等待所有线程结束的任何方式?
并帮助人们理解为什么某些线程会阻止程序退出,而其他线程则没有。
答案 3 :(得分:0)
我想我已经解决了这个问题:在打印机代理中的System.Threading.Thread.CurrentThread.IsBackground <- false
之后添加let!
。
但是,我尝试在所有System.Threading.Thread.CurrentThread.IsBackground <- false
之后添加let!
来修改原始代码(Tomas'AsyncChannel修复之前的第一个版本),但仍然无效。不知道。
感谢大家的帮助。我终于可以开始我的第一个F#应用程序进行批处理。我认为MailboxProcessor默认情况下应该将IsBackground设置为false。无论如何要求微软改变它。
[更新] 刚刚发现编译后的程序集运行良好。但是fsi --user:Program --exec --quiet
仍然是一样的。这似乎是fsi的错误?