现在,当我解析一些html(例如黑客新闻的首页)时,它可以正常工作。我可以在doc = Nokogiri::HTML(open('news.ycombinator.com'))
之类的地方打电话给我,我会回来的Nokogiri::HTML::Document < Nokogiri::XML::Document
问题是,在终端中,我看到的是HTML,而不是实际的Nokogiri元素。我希望看到它,因为它向我显示了诸如Nokogiri Elements Children之类的有价值的信息,或一系列链接or或or。
我使用Watir Gem通过以下方法获取HTML:
[1] pry(main)> browser = Watir::Browser.new(:firefox)
#<Watir::Browser:0x2c5654b29ef00c22 url="about:blank" title="">
[2] pry(main)> browser.goto('news.ycombinator.com')
"http://news.ycombinator.com"
[3] pry(main)> browser.html
browser.html是包含未解析的HTML的实例变量(我认为吗?)。
如果我打doc = Nokogiri::HTML.parse(browser.html)
这就是我想要回来的东西:
我要去哪里错了?
根据要求添加原始代码:
Nokogiri::HTML::Document < Nokogiri::XML::Document
[31] pry(main)> doc = Nokogiri::HTML.parse(browser.html)
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html op="news">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta name="referrer" content="origin">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<link rel="stylesheet" type="text/css" href="news.css?stXbi7LCyutClfTUMe1b">
<link rel="shortcut icon" href="favicon.ico">
<link rel="alternate" type="application/rss+xml" title="RSS" href="rss">
<title>Hacker News</title>
</head>
<body>
<center><table id="hnmain" width="85%" cellspacing="0" cellpadding="0" border="0" bgcolor="#f6f6ef">
<tbody>
<tr><td bgcolor="#ff6600"><table style="padding:2px" width="100%" cellspacing="0" cellpadding="0" border="0"><tbody><tr>
<td style="width:18px;padding-right:4px"><a href="https://news.ycombinator.com"><img src="y18.gif" style="border:1px white solid;" width="18" height="18"></a></td>
<td style="line-height:12pt; height:10px;"><span class="pagetop"><b class="hnname"><a href="news">Hacker News</a></b>
<a href="newest">new</a> | <a href="front">past</a> | <a href="newcomments">comments</a> | <a href="ask">ask</a> | <a href="show">show</a> | <a href="jobs">jobs</a> | <a href="submit">submit</a> </span></td>
<td style="text-align:right;padding-right:4px;"><span class="pagetop">
<a href="login?goto=news">login</a>
</span></td>
</tr></tbody></table></td></tr>
<tr id="pagespace" title="" style="height:10px"></tr>
<tr><td>
<table class="itemlist" cellspacing="0" cellpadding="0" border="0">
<tbody>
<tr class="athing" id="19388248">
<td class="title" valign="top" align="right"><span class="rank">1.</span></td> <td class="votelinks" valign="top"><center><a id="up_19388248" href="vote?id=19388248&how=up&goto=news"><div class="votearrow" title="upvote"></div></a></center></td>
<td class="title">
<a href="https://www.bennettnotes.com/post/getting-too-absorbed-into-your-side-projects/" class="storylink">Getting Too Absorbed in Your Side Projects</a><span class="sitebit comhead"> (<a href="from?site=bennettnotes.com"><span class="sitestr">bennettnotes.com</span></a>)</span>
</td>
</tr>
<tr>
<td colspan="2"></td>
<td class="subtext">
<span class="score" id="score_19388248">42 points</span> by <a href="user?id=_davebennett" class="hnuser">_davebennett</a> <span class="age"><a href="item?id=19388248">1 hour ago</a></span> <span id="unv_19388248"></span> | <a href="hide?id=19388248&goto=news">hide</a> | <a href="item?id=19388248">27 comments</a> </td>
</tr>
<tr class="spacer" style="height:5px"></tr>
<tr class="athing" id="19384878">
<td class="title" valign="top" align="right"><span class="rank">2.</span></td> <td class="votelinks" valign="top"><center><a id="up_19384878" href="vote?id=19384878&how=up&goto=news"><div class="votearrow" title="upvote"></div></a></center></td>
<td class="title">
<a href="https://www.nytimes.com/2019/03/13/technology/facebook-data-subpoenas.html" class="storylink">Facebook’s Data Deals Are Under Criminal Investigation</a><span class="sitebit comhead"> (<a href="from?site=nytimes.com"><span class="sitestr">nytimes.com</span></a>)</span>
</td>
</tr>
<tr>
<td colspan="2"></td>
<td class="subtext">
<span class="score" id="score_19384878">661 points</span> by <a href="user?id=tysone" class="hnuser">tysone</a> <span class="age"><a href="item?id=19384878">13 hours ago</a></span> <span id="unv_19384878"></span> | <a href="hide?id=19384878&goto=news">hide</a> | <a href="item?id=19384878">156 comments</a> </td>
</tr>
<tr class="spacer" style="height:5px"></tr>
<tr class="athing" id="19388091">
<td class="title" valign="top" align="right"><span class="rank">3.</span></td> <td class="votelinks" valign="top"><center><a id="up_19388091" href="vote?id=19388091&how=up&goto=news"><div class="votearrow" title="upvote"></div></a></center></td>
<td class="title">
<a href="https://krita.org/en/item/krita-4-2-0-the-first-painting-application-to-bring-hdr-support-to-windows" class="storylink">Krita 4.2.0: First painting application with HDR support on Windows</a><span class="sitebit comhead"> (<a href="from?site=krita.org"><span class="sitestr">krita.org</span></a>)</span>
</td>
...
答案 0 :(得分:0)
听起来像您想要的
doc = Nokogiri::HTML browser.html