如何搜索和匹配文件中的内容?

时间:2012-01-02 23:47:18

标签: ruby file-io

我有一个文本文件:

<table style="background-color: #f3f3f3; font-family: Arial; font-size: 8pt; border-top: #e7e7e7 5px solid" border="0" cellspacing="0" cellpadding="0">
  <tbody>
<tr>
<td style="padding-bottom: 20px; padding-left: 20px; padding-right: 20px; padding-top: 20px">
<p style="color: #b0b0b0"><font color="#808080" size="1"><strong>Important information</strong>: on this communication as it does not purport to be comprehensive. This disclaimer does not purport to exclude any warranties implied by law which may not be lawfully excluded. We have taken precautions to minimise the risk of transmitting software viruses, but we advise you to carry out your own virus checks on any attachment to this e-mail. We cannot accept liability for any loss or damage caused by software </p>

这不是网站的转储,而是应用程序放入文件的内容。

我检查文本文件的方法如下所示:

def check_email_exists(firstname, email_sub, search_string)
email_fldr="C:\\Agent\\TestMailFolder"
email_id="myname@gmail.com"
Dir.chdir("#{email_fldr}\\#{firstname}") do
  Dir.glob("#{email_id}*#{email_sub}*") do |filename|
    File.open(filename) do |file|
      file.readlines(filename).index("#{search_string}")
    end
   end
  end
end

这不起作用。

我在我的search_string传递值是字符串。例如,我正在尝试查看string = "transmitting software"是否在文件中。此外,我正在检查文件是否包含一些不存在的随机字符串。在这种情况下,如果它找到并匹配文件中的值,则应该通过,如果不能,则应该失败。

1 个答案:

答案 0 :(得分:0)

您的文件包含HTML。对于涉及HTML的90%以上的应用程序,您应该使用解析器。我推荐Nokogiri

require 'nokogiri'

html = <<EOT
<table style="background-color: #f3f3f3; font-family: Arial; font-size: 8pt; border-top: #e7e7e7 5px solid" border="0" cellspacing="0" cellpadding="0">
  <tbody>
<tr>
<td style="padding-bottom: 20px; padding-left: 20px; padding-right: 20px; padding-top: 20px">
<p style="color: #b0b0b0"><font color="#808080" size="1"><strong>Important information</strong>: on this communication as it does not purport to be comprehensive. This disclaimer does not purport to exclude any warranties implied by law which may not be lawfully excluded. We have taken precautions to minimise the risk of transmitting software viruses, but we advise you to carry out your own virus checks on any attachment to this e-mail. We cannot accept liability for any loss or damage caused by software </p>
EOT

doc = Nokogiri::HTML::DocumentFragment.parse(html)

content = doc.content

puts content

哪个输出:

Important information: on this communication as it does not purport to be comprehensive. This disclaimer does not purport to exclude any warranties implied by law which may not be lawfully excluded. We have taken precautions to minimise the risk of transmitting software viruses, but we advise you to carry out your own virus checks on any attachment to this e-mail. We cannot accept liability for any loss or damage caused by software 

如果您想查看结果是否包含字符串“transmit software”,请另外尝试:

puts "contains tranmitting software" if (content['transmitting software'])
相关问题