我想使用他们的类或xpathSapply从下面的html代码中提取信息。
我想将不同的信息作为表格捕获,例如
而不是
<div class="userPost">
<div class="postHeading clearfix">
<div class="conditionInfo">
Condition: Condition in which Stomach Acid is Pushed Into the Esophagus</div>
<div class="date">8/12/2014 12:27:53 PM</div>
</div>
<p class="reviewerInfo">Reviewer: Believer, 35-44 Female on Treatment for 2 to less than 5 years (Patient) </p>
<div id="ctnStars">
<div class="catRatings firstEl clearfix">
<p class="category">Effectiveness</p>
<p class="inlineRating starRating"><span class="current-rating" style="width: 100%">
Current Rating: 5</span></p>
</div>
<div class="catRatings clearfix">
<p class="category">Ease of Use</p>
<p class="inlineRating starRating"><span class="current-rating" style="width: 100%">
Current Rating: 5</span></p>
</div>
<div class="catRatings lastEl clearfix">
<p class="category">Satisfaction</p>
<p class="inlineRating starRating"><span class="current-rating" style="width: 100%">
Current Rating: 5</span></p>
</div>
</div>
<p id="comTrunc1" class="comment"><strong>Comment: </strong><br>Most excellent! I tried several different rx's to help with my acid problem and none were as effective as Nexium. After being on it for 3 months I stopped because that was how long my doc thought it would take to heal me. I stopped taking it and boom, the pain was back. Got back on Nexium and am staying on it. Such relief was unexpected.</p>
<p id="comFull1" class="comment" style="display:none"><strong>Comment:</strong><br>Most excellent! I tried several different rx's to help with my acid problem and none were as effective as Nexium. After being on it for 3 months I stopped because that was how long my doc thought it would take to heal me. I stopped taking it and boom, the pain was back. Got back on Nexium and am staying on it. Such relief was unexpected.<br><a onclick="toggle('comTrunc1'); toggle('comFull1');return false;" href="#">Hide Full Comment</a></p>
<div class="actionLinks clearfix">
<p class="helpful">4
people
found this review helpful.<br>
Was this review helpful? <span id="513102_Vote"><a href="#" onclick="return FoundHelpFul('8cbc5bf1-4f86-48e4-ac0f-5b3085949a2a', 513102, true)">Yes</a> | <a href="#" onclick="return FoundHelpFul('8cbc5bf1-4f86-48e4-ac0f-5b3085949a2a', 513102, false)">No</a></span></p><a class="reportAbuse" href="#" onclick="showPopWin('ReportAbuse.aspx?reviewid=513102&userid=8cbc5bf1-4f86-48e4-ac0f-5b3085949a2a',400,160,null, false); return false;">Report This Post</a></div>
答案 0 :(得分:0)
我不清楚你在做什么,但这是一个开始。如果这不是你想要的方向,请在尝试这些方面后编辑你的问题(并包括你的代码)。假设&#34; url&#34;是您从中获得HTML代码的网站网址,请尝试以下内容:
library(xml)
doc <- htmlTreeParse(url) # reads into the object doc the contents of the url
data <- xpathSApply(doc, "//div[@id = 'ctnStars']//[[@class = 'category']", xmlValue, trim = TRUE) # to extract the value of that node ("Effectiveness")