是否可以使用rvest包读取存储在input type =“radio”标签中的文本,然后是TAG span class =“glyphicon glyphicon-ok”。例如:我想在字符向量中读取“碳水化合物和脂肪”
R代码#不起作用,并且NA存储在p_ans
中install.packages('rvest')
library('rvest')
url <- 'http://upscfever.com/upsc-fever/en/test/en-test-sci1.html'
webpage <- read_html(url)
p_ans <- webpage %>%
html_nodes("input + glyphicon-ok") %>%
html_text()
HTML代码
<div class="form-group" id="myform">
<label for="usr">Q1: Energy giving foods are </label>
</div>
<div class="radio">
<label><input type="radio" value="1" name="optradio0">Carbohydrates and fats<span class="glyphicon glyphicon-ok"></span></label>
</div>
<div class="radio">
<label><input type="radio" id="opt1" value="-0.33" name="optradio0">Carbohydrates and Proteins<span id="sp1" class="glyphicon glyphicon-remove"></span></label>
</div>
答案 0 :(得分:0)
library(rvest)
pg <- read_html("http://upscfever.com/upsc-fever/en/test/en-test-sci1.html")
html_nodes(pg, xpath=".//label[input and span[contains(@class, 'glyphicon glyphicon-ok')]]") %>%
html_text()
## [1] "Carbohydrates and fats"
## [2] "saturated fatty acids"
## [3] "unsaturated fatty acids are good for health"
## [4] "unsaturated fats"
## [5] "polypeptides"
## [6] "Maerasmus"
## [7] "Ribulose bisphosphate Carboxylase-Oxygenase "
## [8] "Mercury"
## [9] "Cadmium"
## [10] "Absorb free radicals"
## [11] "A"
## [12] "Calcium - Goitre"
## [13] "none"
## [14] "Excretion of undigested food"
## [15] " complex components of food are broken into simpler substances."
## [16] "starch to sugar"
## [17] "protection of stomach lining"
## [18] "Liver"
## [19] "digestion of fats"
## [20] "only HDC is good"
## [21] "35-42"
## [22] "absorption of food"
## [23] "digest cellulose"
## [24] "meat is easily digested"
## [25] "gall bladder"