我正在构建文件系统搜寻器,并在运行我的脚本时收到以下错误:
wordcrawler.rb:8:in `block in <main>': invalid byte sequence in UTF-8 (ArgumentError)
from /Users/Anconia/.rvm/rubies/ruby-1.9.3-p327/lib/ruby/1.9.1/find.rb:41:in `block in find'
from /Users/Anconia/.rvm/rubies/ruby-1.9.3-p327/lib/ruby/1.9.1/find.rb:40:in `catch'
from /Users/Anconia/.rvm/rubies/ruby-1.9.3-p327/lib/ruby/1.9.1/find.rb:40:in `find'
from wordcrawler.rb:5:in `<main>'
这是我的代码:
require 'find'
count = 0
Find.find('/Users/Anconia/') do |file| # '/' for root directory on OS X
if file =~ /\b(\.txt|\.doc|\.docx)\b/ # check if filename ends in desired format
contents = File.read(file)
if contents =~ /regex/
puts file
count += 1
end
end
end
puts "#{count} files were found"
在我的开发环境中,我使用ruby 1.9.3;但是,当我切换到ruby 1.8.7时,脚本运行正常。如果可能的话,我想继续使用1.9.3。我已经尝试了这篇文章中的每一个解决方案(ruby 1.9: invalid byte sequence in UTF-8),但我的问题仍然存在。有什么建议吗?
答案 0 :(得分:6)
没有正确理解上述帖子的内容。至少,这可以用作this post
的实现示例require 'find'
count = 0
Find.find('/Users/Anconia/') do |file| # '/' for root directory on OS X
if file =~ /\b(\.txt|\.doc|\.docx)\b/ # check if filename ends in desired format
contents = File.read(file).encode!('UTF-8', 'UTF-8', :invalid => :replace) # resolves encoding errors - must use 1.9.3 else use iconv
if contents =~ /regex/
puts file
count += 1
end
end
end
puts "#{count} files were found"