正如你们中的一些人可能已经听过的那样,目前正在进行一些慈善活动,特别是r / atheism。为了帮助/鼓励筹款,我开始编写一个小的Web实用程序来提供有关这些捐赠的实时信息(基本上是来自Reddit的数据来自FirstGiving的数据) - 你可以看到我到目前为止所拥有的here - 它只显示每个subreddit的总数和平均数字,这是非常初步的(也不是很漂亮。)
我想添加的功能是FirstGiving似乎无法提供的功能,即搜索或链接到特定捐赠的功能。上周有很多帖子,人们试图提供捐赠匹配和类似,但也有很多假/巨魔帖子,并没有很好的方法来验证是否有人“提供”(我们都知道截图很容易伪造。)我计划从FirstGiving缓存数据,以允许某人链接到
检查了FirstGiving页面后,似乎有一个未记录的JSON API调用(滚动到页面底部以显示更多捐赠时使用),它将返回捐赠金额,消息和昵称列表作为HTML表格。根据Opera Dragonfly的说法,这是我在浏览器(Opera)中访问时的样子:
URL: http://www.firstgiving.com/ProfileWebApi/Donations
Method: POST
Status: 200 OK
Duration: 1220 ms
请求详情
POST /ProfileWebApi/Donations HTTP/1.1
User-Agent: Opera/9.80 (Windows NT 6.1; U; Edition United Kingdom Local; en) Presto/2.10.229 Version/11.60
Host: www.firstgiving.com
Accept-Language: en-GB,en;q=0.9
Accept-Encoding: gzip, deflate
Referer: http://www.firstgiving.com/fundraiser/r-atheism/ratheism
Cookie: ASP.NET_SessionId=rmsl4b45jdxwykanpoqkb255
Connection: Keep-Alive
Content-Length: 111
Content-Type: application/json;
Accept: application/json, text/javascript, */*; q=0.01
X-Requested-With: XMLHttpRequest
Content-Transfer-Encoding: binary
Request body
{"EventGivingGroupId":1476950,"TotalRaised":"190776.020000","PageIsExpired":false,"PageNumber":4,"PageSize":50}
Response details
HTTP/1.1 200 OK
Cache-Control: private
Content-Length: 62979
Content-Type: application/json; charset=utf-8
Server: Microsoft-IIS/7.5
X-AspNetMvc-Version: 2.0
X-AspNet-Version: 2.0.50727
X-Powered-By: ASP.NET
Date: Tue, 13 Dec 2011 19:13:28 GMT
车身
{"Data":"\u0009\u000d\u000a\u0009\u0009\u0009\u0009\u000d\u000a <table class=\"donationTable collapsed\" border=\"0\" cellspacing=\"0\" cellpadding=\"0\" style='height:0px; overflow:hidden;' >\u000d\u000a <thead class=\"visuallyhidden\">\u000d\u000a\u0009\u0009 <tr>\u000d\u000a <th scope=\"col\">Comment<\/th>\u000d\u000a <th scope=\"col\" class=\"amount\">Donation<\/th>\u000d\u000a <\/tr>\u000d\u000a <\/thead>\u000d\u000a\u0009\u0009\u0009 \u000d\u000a <tr> \u000d\u000a <td class=\"comment\">\u000d\u000a \u000d\u000a <strong>Dear Regan Layman<\/strong>\u000d\u000a Happy holidays :)<br \/>\u000d\u000a \u000d\u000a <time datetime=\"2011-12-10T21:55:35.0000000\">\u000d\u000a 12\/10\/2011\u000d\u000a <\/time>\u000d\u000a \u000d\u000a <\/td>\u000d\u000a \u000d\u000a <td class=\"amount\">\u000d\u000a $20.00<sup style=\"font-size:10px;\" title=\"Offline donation\"><\/sup> \u000d\u000a \u000d\u000a <\/td>\u000d\u000a <\/tr>\u000d\u000a\u0009 \u000d\u000a <tr> \u000d\u000a <td class=\"comment\">\u000d\u000a \u000d\u000a <strong>Frodo Baggins<\/strong>\u000d\u000a Due to the fact that doctors heal people, not God!<br \/>\u000d\u000a \u000d\u000a <time datetime=\"2011-12-10T21:52:11.0000000\">\u000d\u000a 12\/10\/2011\u000d\u000a <\/time>\u000d\u000a \u000d\u000a <\/td>\u000d\u000a \u000d\u000a <td class=\"amount\">\u000d\u000a $4.64<sup style=\"font-size:10px;\" title=\"Offline donation\"><\/sup> \u000d\u000a \u000d\u000a <\/td>\u000d\u000a <\/tr>\u000d\u000a\u0009 \u000d\u000a
(剪掉了响应体的其余部分。另外,通常有更多的cookie,但我手动删除除了aspsession id以外的所有内容,并且它正常工作,因此它们似乎与除分析等之外的任何内容无关)
但是,当我尝试从perl脚本执行相同的操作时,我没有得到这个有用的输出。这是我的剧本:
#!/usr/bin/perl -w
use LWP::Simple;
use JSON;
use HTTP::Cookies;
use LWP::UserAgent;
use Data::Dumper;
my $cookie_jar = HTTP::Cookies->new;
my $ua = LWP::UserAgent->new(cookie_jar => $cookie_jar);
#push @{ $ua->requests_redirectable }, 'POST';
$ua->get('http://www.firstgiving.com/fundraiser/r-atheism/ratheism');
print Dumper $cookie_jar;
my $req = HTTP::Request->new(
'POST',
'http://www.firstgiving.com/ProfileWebApi/Donations');
$req->header('Accept-Encoding' => 'gzip, deflate');
$req->header('Referer' => 'http://www.firstgiving.com/fundraiser/r-atheism/ratheism');
$req->header('X-Requested-With' => 'XMLHttpRequest');
$req->header('Content-Transfer-Encoding' => 'binary');
$req->header('Content-type:' => 'application/json');
$req->header('User-Agent' => 'Opera/9.80 (Windows NT 6.1; U; Edition United Kingdom Local; en) Presto/2.10.229 Version/11.60');
$req->content('{"EventGivingGroupId":1476950,"TotalRaised":"190776.020000","PageIsExpired":true,"PageNumber":2,"PageSize":50}');
#$req->content('{"EventGivingGroupId":1476950,"PageNumber":1,"PageSize":50}');
my $post_request = $ua->request($req);
print Dumper( ($post_request) );
这是输出:
$VAR1 = bless( {
'COOKIES' => {
'www.firstgiving.com' => {
'/' => {
'ASP.NET_SessionId' => [
0,
'yynhqi2udtz4y055fakdvjiu',
undef,
1,
undef,
undef,
1,
{
'HttpOnly' => undef
}
]
}
}
}
}, 'HTTP::Cookies' );
$VAR1 = bless( {
'_protocol' => 'HTTP/1.1',
'_content' => '<html><head><title>Object moved</title></head><body>
<h2>Object moved to <a href="%2ferror%2f404">here</a>.</h2>
</body></html>
',
'_rc' => '302',
'_headers' => bless( {
'x-powered-by' => 'ASP.NET',
'client-response-num' => 1,
'location' => '/error/404',
'cache-control' => 'private',
'date' => 'Tue, 13 Dec 2011 19:43:56 GMT',
'client-peer' => '204.12.127.197:80',
'x-aspnet-version' => '2.0.50727',
'client-date' => 'Tue, 13 Dec 2011 19:36:45 GMT',
'x-aspnetmvc-version' => '2.0',
'content-type' => 'text/html; charset=utf-8',
'title' => 'Object moved',
'client-transfer-encoding' => [
'chunked'
],
'server' => 'Microsoft-IIS/7.5'
}, 'HTTP::Headers' ),
'_msg' => 'Found',
'_request' => bless( {
'_content' => '{"EventGivingGroupId":1476950,"TotalRaised":"190776.020000","PageIsExpired":true,"PageNumber":2,"PageSize":50}',
'_uri' => bless( do{\(my $o = 'http://www.firstgiving.com/ProfileWebApi/Donations')}, 'URI::http' ),
'_headers' => bless( {
'cookie2' => '$Version="1"',
'user-agent' => 'Opera/9.80 (Windows NT 6.1; U; Edition United Kingdom Local; en) Presto/2.10.229 Version/11.60',
'cookie' => 'ASP.NET_SessionId=yynhqi2udtz4y055fakdvjiu',
'x-requested-with' => 'XMLHttpRequest',
'accept-encoding' => 'gzip, deflate',
'content-transfer-encoding' => 'binary',
'content-type:' => 'application/json',
'referer' => 'http://www.firstgiving.com/fundraiser/r-atheism/ratheism'
}, 'HTTP::Headers' ),
'_method' => 'POST',
'_uri_canonical' => $VAR1->{'_request'}{'_uri'}
}, 'HTTP::Request' )
}, 'HTTP::Response' );
如果我启用了行push @{ $ua->requests_redirectable }, 'POST';
(即允许重定向POST),则重定向到404 error page
如果这是FirstGiving故意尝试阻止非人类客户,我当然会放弃,但他们的robots.txt似乎并没有禁止我正在做的事情。
答案 0 :(得分:2)
添加Accept: application/json, text/javascript, */*; q=0.01
标头。不是我通常认为是关键的标题,但在这种情况下似乎是。
我使用curl
做了一个快速的小测试。这很有效:
curl -vv -H 'Content-Type: application/json' \
-H 'Referer: http://www.firstgiving.com/fundraiser/r-atheism/ratheism' \
-H 'Cookie: ASP.NET_SessionId=svqlde45h0cvrv55hqvhwv55;' \
-H 'X-Requested-With: XMLHttpRequest' \
-H 'Accept: application/json, text/javascript, */*; q=0.01' \
-d '{"EventGivingGroupId":1476950,"TotalRaised":"191532.480000","PageIsExpired":false,"PageNumber":2,"PageSize":50}' \
'http://www.firstgiving.com/ProfileWebApi/Donations'
这给了我重定向:
curl -vv -H 'Content-Type: application/json' \
-H 'Referer: http://www.firstgiving.com/fundraiser/r-atheism/ratheism' \
-H 'Cookie: ASP.NET_SessionId=svqlde45h0cvrv55hqvhwv55;' \
-H 'X-Requested-With: XMLHttpRequest' \
-d '{"EventGivingGroupId":1476950,"TotalRaised":"191532.480000","PageIsExpired":false,"PageNumber":2,"PageSize":50}' \
'http://www.firstgiving.com/ProfileWebApi/Donations'