我正在用Ruby编写程序,将RSS源中的文件下载到本地硬盘。以前,我在Perl中编写了这个应用程序,并且想出了一个学习Ruby的好方法,就是使用Ruby代码重新创建这个程序。
在Perl程序(有效)中,我能够直接从托管的服务器上下载原始文件(保留原始文件名)并且效果很好。在Ruby程序(不能正常工作)中,我必须将我想要的文件中的数据“流”化为我在硬盘上创建的新文件。不幸的是,这不起作用,“流”数据总是空着。我的假设是,Perl可以处理某种重定向,直接检索Ruby无法检索的文件。
我要发布这两个程序(它们相对较小)并希望这有助于解决我的问题。如果您有任何疑问,请告诉我。作为旁注,我将这个程序指向一个更静态的URL(一个jpeg),它下载文件就好了。这就是为什么我认为某种重定向会导致问题。
Ruby代码(不起作用)
require 'net/http';
require 'open-uri';
require 'rexml/document';
require 'sqlite3';
# Create new SQLite3 database connection
db_connection = SQLite3::Database.new('fiend.db');
# Make sure I can reference records in the query result by column name instead of index number
db_connection.results_as_hash = true;
# Grab all TV shows from the shows table
query = '
SELECT
id,
name,
current_season,
last_episode
FROM
shows
ORDER BY
name
';
# Run through each record in the result set
db_connection.execute(query) { |show|
# Pad the current season number with a zero for later user in a search query
season = '%02d' % show['current_season'].to_s;
# Calculate the next episode number and pad with a zero
next_episode = '%02d' % (Integer(show['last_episode']) + 1).to_s;
# Store the name of the show
name = show['name'];
# Generate the URL of the RSS feed that will hold the list of torrents
feed_url = URI.encode("http://btjunkie.org/rss.xml?query=#{name} S#{season}E#{next_episode}&o=52");
# Generate a simple string the denotes the show, season and episode number being retrieved
episode_id = "#{name} S#{season}E#{next_episode}";
puts "Loading feed for #{name}..";
# Store the response from the download of the feed
feed_download_response = Net::HTTP.get_response(URI.parse(feed_url));
# Store the contents of the response (in this case, XML data)
xml_data = feed_download_response.body;
puts "Feed Loaded. Parsing items.."
# Create a new REXML Document and pass in the XML from the Net::HTTP response
doc = REXML::Document.new(xml_data);
# Loop through each in the feed
doc.root.each_element('//item') { |item|
# Find and store the URL of the torrent we wish to download
torrent_url = item.elements['link'].text + '/download.torrent';
puts "Downloading #{episode_id} from #{torrent_url}";
## This is where crap stops working
# Open Connection to the host
Net::HTTP.start(URI.parse(torrent_url).host, 80) { |http|
# Create a torrent file to dump the data into
File.open("#{episode_id}.torrent", 'wb') { |torrent_file|
# Try to grab the torrent data
data = http.get(torrent_url[19..torrent_url.size], "User-Agent" => "Mozilla/4.0").body;
# Write the data to the torrent file (the data is always coming back blank)
torrent_file.write(data);
# Close the torrent file
torrent_file.close();
}
}
break;
}
}
Perl代码(有效)
use strict;
use XML::Parser;
use LWP::UserAgent;
use HTTP::Status;
use DBI;
my $dbh = DBI->connect("dbi:SQLite:dbname=fiend.db", "", "", { RaiseError => 1, AutoCommit => 1 });
my $userAgent = new LWP::UserAgent; # Create new user agent
$userAgent->agent("Mozilla/4.0"); # Spoof our user agent as Mozilla
$userAgent->timeout(20); # Set timeout limit for request
my $currentTag = ""; # Stores what tag is currently being parsed
my $torrentUrl = ""; # Stores the data found in any node
my $isDownloaded = 0; # 1 or zero that states whether or not we've downloaded a particular episode
my $shows = $dbh->selectall_arrayref("SELECT id, name, current_season, last_episode FROM shows ORDER BY name");
my $id = 0;
my $name = "";
my $season = 0;
my $last_episode = 0;
foreach my $show (@$shows) {
$isDownloaded = 0;
($id, $name, $season, $last_episode) = (@$show);
$season = sprintf("%02d", $season); # Append a zero to the season (e.g. 6 becomes 06)
$last_episode = sprintf("%02d", ($last_episode + 1)); # Append a zero to the last episode (e.g. 6 becomes 06) and increment it by one
print("Checking $name S" . $season . "E" . "$last_episode \n");
my $request = new HTTP::Request(GET => "http://btjunkie.org/rss.xml?query=$name S" . $season . "E" . $last_episode . "&o=52"); # Retrieve the torrent feed
my $rssFeed = $userAgent->request($request); # Store the feed in a variable for later access
if($rssFeed->is_success) { # We retrieved the feed
my $parser = new XML::Parser(); # Make a new instance of XML::Parser
$parser->setHandlers # Set the functions that will be called when the parser encounters different kinds of data within the XML file.
(
Start => \&startHandler, # Handles start tags (e.g. )
End => \&endHandler, # Handles end tags (e.g.
Char => \&DataHandler # Handles data inside of start and end tags
);
$parser->parsestring($rssFeed->content); # Parse the feed
}
}
#
# Called every time XML::Parser encounters a start tag
# @param: $parseInstance {object} | Instance of the XML::Parser. Passed automatically when feed is parsed.
# @param: $element {string} | The name of the XML element being parsed (e.g. "title"). Passed automatically when feed is parsed.
# @attributes {array} | An array of all of the attributes of $element
# @returns: void
#
sub startHandler {
my($parseInstance, $element, %attributes) = @_;
$currentTag = $element;
}
#
# Called every time XML::Parser encounters anything that is not a start or end tag (i.e, all the data in between tags)
# @param: $parseInstance {object} | Instance of the XML::Parser. Passed automatically when feed is parsed.
# @param: $element {string} | The name of the XML element being parsed (e.g. "title"). Passed automatically when feed is parsed.
# @attributes {array} | An array of all of the attributes of $element
# @returns: void
#
sub DataHandler {
my($parseInstance, $element, %attributes) = @_;
if($currentTag eq "link" && $element ne "\n") {
$torrentUrl = $element;
}
}
#
# Called every time XML::Parser encounters an end tag
# @param: $parseInstance {object} | Instance of the XML::Parser. Passed automatically when feed is parsed.
# @param: $element {string} | The name of the XML element being parsed (e.g. "title"). Passed automatically when feed is parsed.
# @attributes {array} | An array of all of the attributes of $element
# @returns: void
#
sub endHandler {
my($parseInstance, $element, %attributes) = @_;
if($element eq "item" && $isDownloaded == 0) { # We just finished parsing an element so let's attempt to download a torrent
print("DOWNLOADING: $torrentUrl" . "/download.torrent \n");
system("echo.|lwp-download " . $torrentUrl . "/download.torrent"); # We echo the "return " key into the command to force it to skip any file-overwite prompts
if(unlink("download.torrent.html")) { # We tried to download a 'locked' torrent
$isDownloaded = 0; # Forces program to download next torrent on list from current show
}
else {
$isDownloaded = 1;
$dbh->do("UPDATE shows SET last_episode = '$last_episode' WHERE id = '$id'"); # Update DB with new show information
}
}
}
答案 0 :(得分:1)
是的,您要检索的网址似乎返回302(重定向)。 Net :: HTTP需要/允许您自己处理重定向。您通常使用像AboutRuby提到的递归技术(虽然这http://www.ruby-forum.com/topic/142745建议您不仅应该查看“位置”字段,还应该查看响应中的META REFRESH。)
如果您对低级别互动不感兴趣,open-uri会为您处理重定向:require 'open-uri'
File.open("#{episode_id}.torrent", 'wb') {|torrent_file| torrent_file.write open(torrent_url).read}
答案 1 :(得分:0)
get_response将从HTTPResponse层次结构返回一个类。它通常是HTTPSuccess,但是如果有重定向,它将是HTTPRedirection。一个简单的递归方法可以解决这个问题,它遵循重定向。如何正确处理此问题的方法是在{跟随重定向'标题下的docs。