Question

我尝试将Erlang的httpc模块用于高并发请求

我在spawn中的许多请求的代码没有用到：

-module(t).
-compile(export_all).

start() ->
  ssl:start(),
  inets:start( httpc, [{profile, default}] ),
  httpc:set_options([{max_sessions, 200}, {pipeline_timeout, 20000}], default),

  {ok, Device} = file:open("c:\urls.txt", read),
  read_each_line(Device).

read_each_line(Device) ->
  case io:get_line(Device, "") of
    eof  -> file:close(Device);
    Line -> go( string:substr(Line, 1,length(Line)-1)),
      read_each_line(Device)
  end.

go(Url)->
  spawn(t,geturl, [Url] ).

geturl(Url)->
  UrlHTTP=lists:concat(["http://www.",  Url]),
  io:format(UrlHTTP),io:format("~n"),

  {ok, RequestId}=httpc:request(get,{UrlHTTP,[{"User-Agent", "Opera/9.80 (Windows NT 6.1; U; ru) Presto/2.8.131 Version/11.10"}]}, [],[{sync, false}]),

  receive
    {http, {RequestId, {_HttpOk, _ResponseHeaders, Body}}} -> io:format("ok"),ok
  end.

httpc：在html Body中没有收到请求 - 如果我可以在

中使用spawn

go(Url)->
      spawn(t,geturl, [Url] ).

http://erlang.org/doc/man/httpc.html

注意

如果可能，客户端将保持其连接活动并使用具有或不具有管道的持久连接取决于配置和当前情况。 HTTP / 1.1规范没有提供理想的请求数量的指南通过持久连接发送，这在很大程度上取决于应用。请注意，很长的请求队列可能会导致用户感知延迟，因为早先的请求可能需要很长时间才能完成。 HTTP / 1.1规范确实建议限制为2持久性每台服务器的连接数，这是max_sessions的默认值选项

urls.txt包含不同的网址 - 例如

google.com
amazon.com
alibaba.com
...

怎么了？

Answer 1

对我而言，如果我在没有args inets:start()的情况下启动inets，则默认启动httpc服务。因为我没有urls.txt的例子，所以我没有播放整个代码，但在shell中，我得到了http请求的答案。

当我尝试使用inets:start( httpc, [{profile, default}] )启动inets时，我得到返回值：{error,inets_not_started}。

您应该检查应用程序启动的返回值以跟踪潜在问题：

...
ok = ssl:start(),
ok = inets:etart(),
...

或者如果应用程序已经启动，请使用如下函数：

...
ok = ensure_start(ssl),
ok = ensure_start(inets),
...
ensure_start(M) ->
    case M:start() of
        ok -> ok;
        {error,{already_started,M}} -> ok;
        Other -> Other
    end.

[编辑2 - 小代码增强]

我测试了这段代码，它可以在我的电脑上运行。请注意，您使用的是＆＃39; \＆＃39;在用于文件访问的字符串中，这是一个使该行失败的转义序列。

-module(t).
-compile(export_all).

start() -> start(2000).

% To is a parameter which is passed to getUrl to change the timeout value
% you can play with it to see the request queue effect, and the very variable time of 
% sites response. default value is 2 seconds
start(To) ->
  ok = ensure_start(ssl),
  ok = ensure_start(inets),
  ok = httpc:set_options([{max_sessions, 200}, {pipeline_timeout, 20000}], default),

  {ok, Device} = file:open("D:/urls.txt", read),
  read_each_line(Device,To).

read_each_line(Device,To) ->
  case io:get_line(Device, "") of
    eof  -> file:close(Device);
    Line -> go( string:substr(Line, 1,length(Line)-1),To),
      read_each_line(Device,To)
  end.

go(Url,To)->
  spawn(t,geturl, [Url,To] ).

geturl(Url,To)->
  UrlHTTP=lists:concat(["http://www.",  Url]),
  io:format(UrlHTTP), io:format("~n"),

  {ok, RequestId}=httpc:request(get,{UrlHTTP,[{"User-Agent", "Opera/9.80 (Windows NT 6.1; U; ru) Presto/2.8.131 Version/11.10"}]}, [],[{sync, false}]),

  M = receive
        {http, {RequestId, {_HttpOk, _ResponseHeaders, _Body}}} -> ok
      after To ->
        not_ok
      end,
  io:format("httprequest to ~p: ~p~n",[UrlHTTP,M]).

  ensure_start(M) ->
    case M:start() of
        ok -> ok;
        {error,{already_started,M}} -> ok;
        Other -> Other
    end.

并在控制台中：

1> t:start().
http://www.povray.org
http://www.google.com
http://www.yahoo.com
ok
httprequest to "http://www.google.com": ok
httprequest to "http://www.povray.org": ok
httprequest to "http://www.yahoo.com": ok
2> t:start().
http://www.povray.org
http://www.google.com
http://www.yahoo.com
ok
httprequest to "http://www.google.com": ok
httprequest to "http://www.povray.org": ok
httprequest to "http://www.yahoo.com": ok
3>

请注意，由于ensure_start，您可以启动应用程序两次。

我还测试了一个坏网址并且检测到了它。

我的测试只包含3个网址，我想如果有很多网址，那么获得响应的时间会增加，因为生成进程的循环执行速度比请求自己快。因此，您必须在某些时候预期一些超时问题。 http客户端可能还有一些限制，我没有检查这个特定点的文档。

httpc + spawn中的许多请求都不起作用

1 个答案: