GET {httr}返回错误请求响应

时间:2014-12-01 23:51:26

标签: r httr

我正在尝试抓取searchlink中存储的网址的html元素。唯一适用于我的方法是htmlTreeParse {XML}。但是,它并没有返回我正在寻找的元素。例如:img[@title='Add to compare']

searchlink <- "http://www.realtor.ca/Map.aspx#CultureId=1&ApplicationId=1&RecordsPerPage=9&MaximumResults=9&PropertyTypeId=300&TransactionTypeId=2&SortOrder=A&SortBy=1&LongitudeMin=-114.52066040039104&LongitudeMax=-113.60536193847697&LatitudeMin=50.94776904194829&LatitudeMax=51.14246522072541&PriceMin=0&PriceMax=0&BedRange=0-0&BathRange=0-0&ParkingSpaceRange=0-0&viewState=m&Longitude=-114.063011169434&Latitude=51.0452194213867&ZoomLevel=11&CurrentPage=1" 

doc <- htmlTreeParse(searchlink,useInternalNodes = T)


   classes <- xpathSApply(doc,"//img[@title='Add to compare']",function(x){xmlGetAttr(x,'class')})

上面运行类的结果:

list()

我还尝试了readLinesGET {httr},但他们都在阅读网址时返回错误。我猜它是因为网址中的特殊字符,但不知道如何修复它。回复如下:

Response [http://www.realtor.ca/Map.aspx#CultureId=1&ApplicationId=1&RecordsPerPage=9&MaximumResults=9&PropertyTypeId=300&TransactionTypeId=2&SortOrder=A&SortBy=1&LongitudeMin=-114.52066040039104&LongitudeMax=-113.60536193847697&LatitudeMin=50.94776904194829&LatitudeMax=51.14246522072541&PriceMin=0&PriceMax=0&BedRange=0-0&BathRange=0-0&ParkingSpaceRange=0-0&viewState=m&Longitude=-114.063011169434&Latitude=51.0452194213867&ZoomLevel=11&CurrentPage=1]
  Date: 2014-12-01 16:46
  Status: 400
  Content-type: text/html; charset=us-ascii
  Size: 324 B
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN""http://www.w3.org/TR/html4/strict.dtd">
<HTML><HEAD><TITLE>Bad Request</TITLE>
<META HTTP-EQUIV="Content-Type" Content="text/html; charset=us-ascii"></HEAD>
<BODY><h2>Bad Request - Invalid URL</h2>
<hr><p>HTTP Error 400. The request URL is invalid.</p>
</BODY></HTML> 

1 个答案:

答案 0 :(得分:1)

尝试删除网址中的#,我只是替换为?

library("httr")
url <- "http://www.realtor.ca/Map.aspx?CultureId=1&ApplicationId=1&RecordsPerPage=9&MaximumResults=9&PropertyTypeId=300&TransactionTypeId=2&SortOrder=A&SortBy=1&LongitudeMin=-114.52066040039104&LongitudeMax=-113.60536193847697&LatitudeMin=50.94776904194829&LatitudeMax=51.14246522072541&PriceMin=0&PriceMax=0&BedRange=0-0&BathRange=0-0&ParkingSpaceRange=0-0&viewState=m&Longitude=-114.063011169434&Latitude=51.0452194213867&ZoomLevel=11&CurrentPage=1"
res <- GET(url)
tt <- content(res)

然后解析tt

中的html内容