我从xhtml文件中检索内容。内容包含src="/tmp/folder_name/file_name"
的img标记。我想将" / tmp / folder_name / file_name" 中的src值替换为" file_name" 。下面的代码是从xhtml获取内容的方式。我尝试了Nokogiri::HTML(section_content)
。但结果内容不是xhtml。如何将其转换回xhtml或如何在没有Nokogiri::HTML
的情况下从内容中替换src值
section_content = section.export_xhtml_content file_path
doc = Nokogiri::HTML(section_content)
unless doc.css('div.image_content').blank?
doc.css('div.image_content img').each do |img|
newsrc = File.basename img[:src]
img.set_attribute('src', newsrc)
end
end
section_content = doc.to_s
内容:
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>File 1: Chapter1</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
<link href="stylesheet.css" type="text/css" rel="stylesheet"/>
<link href="page_styles.css" type="text/css" rel="stylesheet"/>
</head>
<body class="publitory">
<h1 id="File_1_1">Chapter1</h1>
<h2 id="File_1_2">Content1</h2>
<h3 id="File_1_3">Content1.1</h3>
<p/>
<div style="width:25%; margin: 0 auto;" data-align="Middle" class="image_content">
<img width="100%" src="/tmp/fog/development_publitory_bucket/uploads/user/b57030de-89ac-11e3-9cf2-bdfa8a998e1e/book/053bab68-b4b2-11e3-8ed6-996ec04a57ef/oeb_image/angel7eef59eb838ac763a43b936763dd184ec3324318.jpeg"/>
<div class="caption" style="clear:both;">Content1.1 content Content1.1 content Content1.1 content Content1.1 content Content1.1 content Content1.1 content Content1.1 content Content1.1 content Content1.1 content Content1.1 content Content1.1 content Content1.1 content Content1.1 content Content1.1 content Content1.1 content Content1.1 content Content1.1 content<br/></div>
</div>
<br/>
<p/>
<h3 id="File_1_4">Content1.2</h3>
<h2 id="File_1_5">Content2</h2>
<h2 id="File_1_6">Content3</h2>
<h2 id="File_1_7">Content4</h2>
</body>
</html>
使用Nokogiri替换src值后,得到的内容为:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<?xml version="1.0" encoding="utf-8"?>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>File 1: Chapter1</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<link href="stylesheet.css" type="text/css" rel="stylesheet">
<link href="page_styles.css" type="text/css" rel="stylesheet">
</head>
<body class="publitory">
<h1 id="File_1_1">Chapter1</h1>
<h2 id="File_1_2">Content1</h2>
<h3 id="File_1_3">Content1.1</h3>
<p></p>
<div style="width:25%; margin: 0 auto;" data-align="Middle" class="image_content">
<img width="100%" src="angel7eef59eb838ac763a43b936763dd184ec3324318.jpeg">
<div class="caption" style="clear:both;">Content1.1 content Content1.1 content Content1.1 content Content1.1 content Content1.1 content Content1.1 content Content1.1 content Content1.1 content Content1.1 content Content1.1 content Content1.1 content Content1.1 content Content1.1 content Content1.1 content Content1.1 content Content1.1 content Content1.1 content<br>
</div>
</div>
<br>
<p></p>
<h3 id="File_1_4">Content1.2</h3>
<h2 id="File_1_5">Content2</h2>
<h2 id="File_1_6">Content3</h2>
<h2 id="File_1_7">Content4</h2>
</body>
</html>
结果内容应该是完美的xhtml。帮我解决这个问题。提前谢谢。
答案 0 :(得分:1)
您需要执行的基本步骤:
Nokogiri::XML
.xpath
或.css
查询