Question

当我运行我的代码时，我收到了非常奇怪的错误消息：

/Users/Pan/Data/external/filter_url_1008.rb:35: syntax error, unexpected keyword_end
/Users/Pan/Data/external/filter_url_1008.rb:45: syntax error, unexpected end-of-input, expecting keyword_end
filter_file.close
                 ^

我多次查看我的Ruby代码，但无法找出错误。

#This script is for filterring any html files that doesn't abide the rule.
require "fileutils"

#path where html files will be read from
source_dir = "/20131008" 

#path where flittered html files will be copy to
dest_dir ="/20131008_filtered"

#file index to be filtered
filter_file = File.open("filtered_index.txt","r")

if !File.exist?(dest_dir) 
    FileUtils.mkdir_p("/dest_dir")
    print(dest_dir + " was created!\n") 
end

#filter rule
blacklist = ["facebook.com", "youtube.com", "twitter.com",
"linkedin.com", "bebo.com", "twitlonger.com", "bing.com", "ebay.com",
"ebayrt.com", "maps.google", "waze.com", "foursquare.com", "adf.ly", 
"twitpic.com","itunes.apple.com","craigslist.org","instagram.com", 
"google.com", "google.co.uk", "google.ie","bullhornreach", 
"pinterest.com", "feedsportal","tumblr.com"]

filter = filter_file.read

#Read from 20131008_filtered.txt and exclude urls that's in blacklist
filter.each_line do |line|
    $match_count = 0

    blacklist.each do |blacklist_atom|
        if !(line.downcase.include? "blacklist_atom")
            match_count += 1
        end
    end

    if (blacklist.length == match_count)
        filename_cp = line[line.index("20131008/") + 9..line.index(".html") - 1]
        filename = filename_cp.to_s + ".html"
        FileUtils.cp(source_dir + "/" + filename, dest_dir)
    end
end

filter_file.close

Answer 1

您不能在Ruby中使用++运算符。请改用match_count += 1。

修改

它们不是“非常奇怪的错误消息”，它只是一条消息，指示语法错误：程序甚至没有开始被解释，这是一个预运行检查。

Answer 2

删除if语句行上的then？它可能有效，但肯定不常用。

Ruby也没有++运算符。

Answer 3

你做错了几件事。这不是尝试重写代码所以它没有错误，它是为了展示如何以更易于维护的方式编写代码并更接近Ruby方式：

require 'fileutils'

SOURCE_DIR = '/20131008' 
DEST_DIR ='/20131008_filtered'

BLACKLIST = %w[
  adf.ly
  bebo.com
  bing.com
  bullhornreach
  craigslist.org
  ebay.com
  ebayrt.com
  facebook.com
  feedsportal
  foursquare.com
  google.co.uk
  google.com
  google.ie
  instagram.com
  itunes.apple.com
  linkedin.com
  maps.google
  pinterest.com
  tumblr.com
  twitlonger.com
  twitpic.com
  twitter.com
  waze.com
  youtube.com
]

unless File.exist?(DEST_DIR) 
  FileUtils.mkdir_p(DEST_DIR)
  print(DEST_DIR + " was created!\n") 
end

File.foreach("filtered_index.txt") do |line|
  # $match_count = 0
  match_count = 0

  BLACKLIST.each do |blacklist_atom|
    match_count += 1 unless (line.downcase[blacklist_atom])
  end

  if (BLACKLIST.length == match_count)
    FileUtils.cp(
      File.join(
        SOURCE_DIR,
        File.basename(
          line,
          File.extname(line)
        ) + '.html'
      ),
      DEST_DIR
    )
  end
end

出了什么问题：

使用常量，并将它们移动到文件的顶部，以便于查看/编辑。
对网站名称等列表进行排序，因此编辑/扩展列表更容易，看看是否已有条目。
不要打开一个文件，做一堆东西，然后将它全部读入内存，做更多的东西，拆分它，然后遍历这些行，然后做一堆东西，然后关闭它。相反，使用更智能的文件方法，如File.foreach，并使用一个块打开然后自动关闭文件。将文件完全读入内存是一个非常糟糕的习惯，因为它根本不可扩展;想象一下，如果文件大于程序的可用内存，将会发生什么。
您没有将变量插入到字符串中，而且还要添加额外的前导路径分隔符：
```
FileUtils.mkdir_p("/dest_dir")
```
不要在块中使用“$ global”变量。这表明缺乏对变量范围的理解。
全局变量$match_count仅初始化且未读取。

您可以更简洁地编写“非子字符串搜索”：

if !(line.downcase.include? "blacklist_atom")

使用类似的东西：

unless line.downcase[blacklist_atom]

使用内置功能：
```
filename_cp = line[line.index("20131008/") + 9..line.index(".html") - 1]
```
相反，使用File.join(...)因为它知道您的操作系统需要什么路径分隔符。使用File.basename(...)因为它使用相同的路径分隔符来仅提取文件。

为什么我会收到“语法错误，意外_end;”和“意外的输入结束”？

3 个答案:

修改