我正在尝试使用Akka HTTP和Akka Streams来运行一个刮刀。我从一堆索引页面开始,解析链接,然后获取每个链接并解析该页面,返回一堆单独的链接。所以,像这样:
fetch-top-level-page -> list-of-links-to-child-pages -> fetch-child-page -> list-of-links-in-child-page
我的问题是我甚至无法获取单个页面。我尝试获取的每个顶级网址都会导致死信,并且没有任何内容可以使其进一步深入管道。
在这个示例代码中,我要做的就是将HttpRequest
发送到池中以转换为HttpResponse
,并通过将内容打印到屏幕来证明它有效。
implicit val system = ActorSystem("scraper")
implicit val ec = system.dispatcher
implicit val settings = system.settings
implicit val materializer = ActorMaterializer()
val requests = List(HttpRequest(...), HttpRequest(...))
val poolClientFlow = Http().superPool[Promise[HttpResponse]](settings = ConnectionPoolSettings(system).withMaxConnections(10))
Source(requests)
.map (req => { println("-", req); req}) // this part runs fine
.via(poolClientFlow)
.map(resp => {println("|", resp); resp}) // this never runs
.toMat(Sink.foreach { p => println(p) })(Keep.both)
.run()
这是我得到的:
(-,(HttpRequest(...),Future(<not completed>)))
(-,(HttpRequest(...),Future(<not completed>)))
[INFO] [03/07/2018 15:10:03.681] [scraper-akka.actor.default-dispatcher-5] [akka://scraper/user/pool-master] Message [akka.http.impl.engine.client.PoolMasterActor$SendRequest] without sender to Actor[akka://scraper/user/pool-master#1333123700] was not delivered. [1] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
[INFO] [03/07/2018 15:10:03.683] [scraper-akka.actor.default-dispatcher-5] [akka://scraper/user/pool-master] Message [akka.http.impl.engine.client.PoolMasterActor$SendRequest] without sender to Actor[akka://scraper/user/pool-master#1333123700] was not delivered. [2] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
我是Akka的新手并且显然犯了一些基本错误,因为这似乎是Akka,Akka Streams和Akka HTTP的确切用例。
有什么想法吗?
答案 0 :(得分:1)
无法使用Akka HTTP 10.1.0-RC2和Akka Streams 2.5.11进行复制。以下作品:
orders[] // is already filled with data
let materials = [] // create empty array
for (let scheme in schemes) { // loop constant object
materials.push(schemes[scheme].typeID) // take id and add it to materials array
}
materials = Array.from(new Set(materials)) // filter out duplicates
console.log(materials) // everything looks fine here, 83 unique ids
for (let order in orders) { // now loop the main array
if (!materials.includes(orders[order].type_id)) { // if order's containment id is not found in materials array
orders.splice(order,1) // remove it from array
order -= 1 // and one step back to compensate removed index (do I need it or is it processed normally even after .splice?)
}
}
// and send filtered orders into browser when certain url is requested
可能更好的方法就是这样(注意对val requests = List((HttpRequest(uri = "http://akka.io"), Promise[HttpResponse]()),
(HttpRequest(uri = "http://www.yahoo.com"), Promise[HttpResponse]()))
val poolClientFlow =
Http().superPool[Promise[HttpResponse]](settings = ConnectionPoolSettings(system).withMaxConnections(10)
Source(requests)
.map { req => println("-", req); req }
.via(poolClientFlow)
.map { resp => println("|", resp); resp }
.toMat(Sink.foreach(println))(Keep.both)
.run()
// The following is printed:
// (-,(HttpRequest(HttpMethod(GET),http://akka.io,List(),HttpEntity.Strict(none/none,ByteString()),HttpProtocol(HTTP/1.1)),Future()))
// (-,(HttpRequest(HttpMethod(GET),http://www.yahoo.com,List(),HttpEntity.Strict(none/none,ByteString()),HttpProtocol(HTTP/1.1)),Future()))
// (|,(Success(HttpResponse(301 Moved Permanently,List(Date: Thu, 08 Mar 2018 14:33:46 GMT, Connection: keep-alive, Cache-Control: max-age=3600, Expires: Thu, 08 Mar 2018 15:33:46 GMT, Location: https://akka.io/, Server: cloudflare, CF-RAY: 3f8604d386979cf6-AMS),HttpEntity.Chunked(application/octet-stream),HttpProtocol(HTTP/1.1))),Future()))
// (Success(HttpResponse(301 Moved Permanently,List(Date: Thu, 08 Mar 2018 14:33:46 GMT, Connection: keep-alive, Cache-Control: max-age=3600, Expires: Thu, 08 Mar 2018 15:33:46 GMT, Location: https://akka.io/, Server: cloudflare, CF-RAY: 3f8604d386979cf6-AMS),HttpEntity.Chunked(application/octet-stream),HttpProtocol(HTTP/1.1))),Future())
// (|,(Success(HttpResponse(301 Moved Permanently,List(Date: Thu, 08 Mar 2018 14:33:46 GMT, Connection: keep-alive, Via: http/1.1 media-router-fp6.prod.media.ir2.yahoo.com (ApacheTrafficServer [c s f ]), Server: ATS, Cache-Control: no-store, no-cache, Content-Language: en, X-Frame-Options: SAMEORIGIN, Location: https://www.yahoo.com/),HttpEntity.Strict(text/html,ByteString(114, 101, 100, 105, 114, 101, 99, 116)),HttpProtocol(HTTP/1.1))),Future()))
// (Success(HttpResponse(301 Moved Permanently,List(Date: Thu, 08 Mar 2018 14:33:46 GMT, Connection: keep-alive, Via: http/1.1 media-router-fp6.prod.media.ir2.yahoo.com (ApacheTrafficServer [c s f ]), Server: ATS, Cache-Control: no-store, no-cache, Content-Language: en, X-Frame-Options: SAMEORIGIN, Location: https://www.yahoo.com/),HttpEntity.Strict(text/html,ByteString(114, 101, 100, 105, 114, 101, 99, 116)),HttpProtocol(HTTP/1.1))),Future())
// [WARN] [03/08/2018 14:46:31.003] [scraper-akka.actor.default-dispatcher-4] [scraper/Pool(shared->http://akka.io:80)] [0 (WaitingForResponseEntitySubscription)] Response entity was not subscribed after 1 second. Make sure to read the response entity body or call `discardBytes()` on it. GET / Empty -> 301 Moved Permanently Chunked
的调用):
discardEntityBytes()