Question

为澄清起见，此数据显示不列颠哥伦比亚省活跃和历史性大火的信息。到目前为止，我已经可以使用以下代码成功将所有数据从HTML表中提取：

 Interface_html <- html_nodes(webpage,'td:nth-child(1)')
 Interface_data <- html_text(Interface_html)
 head(Interface_data)

 (...)


Geocoding_df<-data.frame(Fire_no = Fire_no_data, Geographic = 
Geographic_data, Discovery = Discovery_Date_data, Status = Status_data,
Hectares = Hectares_data, Interface = Interface_data, Updatetime = 
Updatetime_data, Updatetime_stg = Updatetime_data_stg)

但是，在第一列中，某些行包含一所小房子的图像。此图像可指示火灾是“界面”火灾，表示它正在威胁建筑物。

基本上，我需要一种方法来提取图像是否存在于行中（理想情况下，图像替代文本“ Interface”，但即使是/否指示器也可以满足我的目的。

是否可以通过修改我已经获得的代码从此表中获取图像属性？

主要目的是，我想将整个表放入SQL中，以便使用PowerBI进行某些数据可视化。

包括屏幕截图：

网站： http://bcfireinfo.for.gov.bc.ca/hprScripts/WildfireNews/Fires.asp?Mode=normal&AllFires=1&FC=0

Answer 1

变量“ Interface_html”是网页中所有行的列表。因此，一种方法是查看每个节点以查看其是否包含img标签。 html_node（不带s）将始终返回结果，无论结果是否成功。在这种情况下，html_node(Interface_html, "img")将返回NA（如果不存在），否则将返回html代码。

library(rvest)

url<-"http://bcfireinfo.for.gov.bc.ca/hprScripts/WildfireNews/Fires.asp?Mode=normal&AllFires=1&FC=0"
webpage<-read_html(url)

#list of all nodes
Interface_html <- html_nodes(webpage,'td:nth-child(1)')

#search each node in list to see if it contains an image tag and return node number.
withimage<- which(!is.na(html_node(Interface_html, "img")))

withimage
#[1] 109 145


#to add the column of True/Falses onto your dataframe use:
Interface = !is.na(html_node(Interface_html, "img"))

使用R在网页中抓取图像

1 个答案: