在Post查询后提取结果

时间:2016-03-03 18:09:55

标签: r post screen-scraping

我试图从site自动提取电力提供。一旦我设置了邮政编码(即:300),我就可以下载(手动)pdf文件

我正在使用httr包:

library(httr)
qr<- POST("http://www.qenergy.com.au/What-Are-Your-Options",
     query=list(postcode=3000))
res <- htmlParse(content(qr))

问题是文件网址不在查询响应中。请帮忙。

1 个答案:

答案 0 :(得分:2)

试试这个

library(httr)
qr<- POST("http://www.qenergy.com.au/What-Are-Your-Options", 
          encode="form", 
          body=list(postcode=3000))
res <- content(qr)
pdfs <- as(res['//a[contains(@href, "pdf")]/@href'], "character")
head(pdfs)
# [1] "flux-content/qenergy/pdf/VIC price fact sheet jemena distribution zone business/Jemena-Freedom-Biz-5-Day-Time-of-Use-A210.pdf"  
# [2] "flux-content/qenergy/pdf/VIC price fact sheet jemena distribution zone business/Jemena-Freedom-Biz-7-Day-Time-of-Use-A250.pdf"  
# [3] "flux-content/qenergy/pdf/VIC price fact sheet jemena distribution zone business/Jemena-Freedom-Biz-Single-Rate-CL.pdf"          
# [4] "flux-content/qenergy/pdf/VIC price fact sheet jemena distribution zone business/Jemena-Freedom-Biz-Single-Rate.pdf"             
# [5] "flux-content/qenergy/pdf/VIC price fact sheet united energy distribution zone business/United-Freedom-Biz-5-Day-Time-of-Use.pdf"
# [6] "flux-content/qenergy/pdf/VIC price fact sheet united energy distribution zone business/United-Freedom-Biz-7-Day-Time-of-Use.pdf"