Question

我从xhtml文件中检索内容。内容包含src="/tmp/folder_name/file_name"的img标记。我想将＆＃34; / tmp / folder_name / file_name＆＃34; 中的src值替换为＆＃34; file_name＆＃34; 。下面的代码是从xhtml获取内容的方式。我尝试了Nokogiri::HTML(section_content)。但结果内容不是xhtml。如何将其转换回xhtml或如何在没有Nokogiri::HTML的情况下从内容中替换src值

  section_content = section.export_xhtml_content file_path
  doc = Nokogiri::HTML(section_content)
    unless doc.css('div.image_content').blank?
      doc.css('div.image_content img').each do |img|
        newsrc = File.basename img[:src]
        img.set_attribute('src', newsrc)
      end
    end
    section_content = doc.to_s

内容：

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <title>File 1: Chapter1</title>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
    <link href="stylesheet.css" type="text/css" rel="stylesheet"/>
    <link href="page_styles.css" type="text/css" rel="stylesheet"/>
  </head>
  <body class="publitory">
    <h1 id="File_1_1">Chapter1</h1>
    <h2 id="File_1_2">Content1</h2>
    <h3 id="File_1_3">Content1.1</h3>
    <p/>
    <div style="width:25%; margin: 0 auto;" data-align="Middle" class="image_content">
       <img width="100%" src="/tmp/fog/development_publitory_bucket/uploads/user/b57030de-89ac-11e3-9cf2-bdfa8a998e1e/book/053bab68-b4b2-11e3-8ed6-996ec04a57ef/oeb_image/angel7eef59eb838ac763a43b936763dd184ec3324318.jpeg"/> 
       <div class="caption" style="clear:both;">Content1.1 content Content1.1 content Content1.1 content Content1.1 content Content1.1 content Content1.1 content Content1.1 content Content1.1 content Content1.1 content Content1.1 content Content1.1 content Content1.1 content Content1.1 content Content1.1 content Content1.1 content Content1.1 content Content1.1 content<br/></div>
   </div>
   <br/>
   <p/>
   <h3 id="File_1_4">Content1.2</h3>
   <h2 id="File_1_5">Content2</h2>
   <h2 id="File_1_6">Content3</h2>
   <h2 id="File_1_7">Content4</h2>
  </body>
</html>

使用Nokogiri替换src值后，得到的内容为：

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<?xml version="1.0" encoding="utf-8"?>
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <title>File 1: Chapter1</title>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
    <link href="stylesheet.css" type="text/css" rel="stylesheet">
    <link href="page_styles.css" type="text/css" rel="stylesheet">
  </head>
  <body class="publitory">
    <h1 id="File_1_1">Chapter1</h1>
    <h2 id="File_1_2">Content1</h2>
    <h3 id="File_1_3">Content1.1</h3>
    <p></p>
    <div style="width:25%; margin: 0 auto;" data-align="Middle" class="image_content">
      <img width="100%" src="angel7eef59eb838ac763a43b936763dd184ec3324318.jpeg">
      <div class="caption" style="clear:both;">Content1.1 content Content1.1 content Content1.1 content Content1.1 content Content1.1 content Content1.1 content Content1.1 content Content1.1 content Content1.1 content Content1.1 content Content1.1 content Content1.1 content Content1.1 content Content1.1 content Content1.1 content Content1.1 content Content1.1 content<br>
      </div>
    </div>
    <br>
    <p></p>
    <h3 id="File_1_4">Content1.2</h3>
    <h2 id="File_1_5">Content2</h2>
    <h2 id="File_1_6">Content3</h2>
    <h2 id="File_1_7">Content4</h2>
  </body>
</html>

结果内容应该是完美的xhtml。帮我解决这个问题。提前谢谢。

Answer 1

您需要执行的基本步骤：

构建文档，例如使用Nokogiri::XML
使用.xpath或.css查询
使用Nokogiri::XML::Node

在rails中的string中替换img src属性值

1 个答案: