当我运行我的代码时,我收到了非常奇怪的错误消息:
/Users/Pan/Data/external/filter_url_1008.rb:35: syntax error, unexpected keyword_end /Users/Pan/Data/external/filter_url_1008.rb:45: syntax error, unexpected end-of-input, expecting keyword_end filter_file.close ^
我多次查看我的Ruby代码,但无法找出错误。
#This script is for filterring any html files that doesn't abide the rule.
require "fileutils"
#path where html files will be read from
source_dir = "/20131008"
#path where flittered html files will be copy to
dest_dir ="/20131008_filtered"
#file index to be filtered
filter_file = File.open("filtered_index.txt","r")
if !File.exist?(dest_dir)
FileUtils.mkdir_p("/dest_dir")
print(dest_dir + " was created!\n")
end
#filter rule
blacklist = ["facebook.com", "youtube.com", "twitter.com",
"linkedin.com", "bebo.com", "twitlonger.com", "bing.com", "ebay.com",
"ebayrt.com", "maps.google", "waze.com", "foursquare.com", "adf.ly",
"twitpic.com","itunes.apple.com","craigslist.org","instagram.com",
"google.com", "google.co.uk", "google.ie","bullhornreach",
"pinterest.com", "feedsportal","tumblr.com"]
filter = filter_file.read
#Read from 20131008_filtered.txt and exclude urls that's in blacklist
filter.each_line do |line|
$match_count = 0
blacklist.each do |blacklist_atom|
if !(line.downcase.include? "blacklist_atom")
match_count += 1
end
end
if (blacklist.length == match_count)
filename_cp = line[line.index("20131008/") + 9..line.index(".html") - 1]
filename = filename_cp.to_s + ".html"
FileUtils.cp(source_dir + "/" + filename, dest_dir)
end
end
filter_file.close
答案 0 :(得分:3)
您不能在Ruby中使用++
运算符。
请改用match_count += 1
。
它们不是“非常奇怪的错误消息”,它只是一条消息,指示语法错误:程序甚至没有开始被解释,这是一个预运行检查。
答案 1 :(得分:0)
删除if语句行上的then
?它可能有效,但肯定不常用。
Ruby也没有++
运算符。
答案 2 :(得分:0)
你做错了几件事。这不是尝试重写代码所以它没有错误,它是为了展示如何以更易于维护的方式编写代码并更接近Ruby方式:
require 'fileutils'
SOURCE_DIR = '/20131008'
DEST_DIR ='/20131008_filtered'
BLACKLIST = %w[
adf.ly
bebo.com
bing.com
bullhornreach
craigslist.org
ebay.com
ebayrt.com
facebook.com
feedsportal
foursquare.com
google.co.uk
google.com
google.ie
instagram.com
itunes.apple.com
linkedin.com
maps.google
pinterest.com
tumblr.com
twitlonger.com
twitpic.com
twitter.com
waze.com
youtube.com
]
unless File.exist?(DEST_DIR)
FileUtils.mkdir_p(DEST_DIR)
print(DEST_DIR + " was created!\n")
end
File.foreach("filtered_index.txt") do |line|
# $match_count = 0
match_count = 0
BLACKLIST.each do |blacklist_atom|
match_count += 1 unless (line.downcase[blacklist_atom])
end
if (BLACKLIST.length == match_count)
FileUtils.cp(
File.join(
SOURCE_DIR,
File.basename(
line,
File.extname(line)
) + '.html'
),
DEST_DIR
)
end
end
出了什么问题:
File.foreach
,并使用一个块打开然后自动关闭文件。将文件完全读入内存是一个非常糟糕的习惯,因为它根本不可扩展;想象一下,如果文件大于程序的可用内存,将会发生什么。您没有将变量插入到字符串中,而且还要添加额外的前导路径分隔符:
FileUtils.mkdir_p("/dest_dir")
不要在块中使用“$ global”变量。这表明缺乏对变量范围的理解。
$match_count
仅初始化且未读取。您可以更简洁地编写“非子字符串搜索”:
if !(line.downcase.include? "blacklist_atom")
使用类似的东西:
unless line.downcase[blacklist_atom]
使用内置功能:
filename_cp = line[line.index("20131008/") + 9..line.index(".html") - 1]
相反,使用File.join(...)
因为它知道您的操作系统需要什么路径分隔符。使用File.basename(...)
因为它使用相同的路径分隔符来仅提取文件。