使用R爬行

时间:2017-10-24 21:24:53

标签: r html-table web-crawler

所有

我需要专家的帮助。

我试图在韩国获得公寓名称,房间数量和价格。

This is the page

我的代码是:

url <- "http://land.findall.co.kr/land_new/subLand.asp?hidSectionCd=6&tmphidSearchGbn=area&selTradeKind=0&hidCategoryCode=0&hidtxtSearch=&txtWolamountMin=0&txtWolamountMax=0&txtPyengMin=0&txtPyengMax=0&txtPremiumMin=0&txtPremiumMax=0&hidListGbn=0&hidOrd=&hidOptCode=0&hidMainSubGbn=S&hidDataLoadGbn=L&hidOptStr=&hidMetro=%BC%AD%BF%EF&hidCity=%B0%AD%B3%B2%B1%B8&hidDong=&hidMapX=0&hidMapY=0&page=1&totalcount=2132&hidContractId=&LineAdNo=&strMP=&BestInfoCnt=0&reCommonCnt=0&BestInfo_Goods=0&isFirst=0&hidSearchGbn=area&selMetro=%BC%AD%BF%EF&selCity=%B0%AD%B3%B2%B1%B8&selDong=&txtSearch=&selRadius=00&selPriceMin=10&selPriceMax=000&selRoomCnt=000&selBathRoom=000&Premium=0-0&wolamount=0-0&Pyeng=&intMilli=&intPyeng="
homes <- read_html(url)
titles <- carInfos %>% html_nodes('.elip') %>% html_text()

我可以得到公寓的名字。

房间数量不足s但我不能再远离这个问题了。 我附上了一个很难让我通过的部分。

enter image description here

1 个答案:

答案 0 :(得分:0)

这可能会使用一些清理,但我认为基础知识是存在的。根据评论,SelectorGadget将成为您的朋友:

library(tidyverse)
library(rvest)

url <- "http://land.findall.co.kr/land_new/subLand.asp?hidSectionCd=6&tmphidSearchGbn=area&selTradeKind=0&hidCategoryCode=0&hidtxtSearch=&txtWolamountMin=0&txtWolamountMax=0&txtPyengMin=0&txtPyengMax=0&txtPremiumMin=0&txtPremiumMax=0&hidListGbn=0&hidOrd=&hidOptCode=0&hidMainSubGbn=S&hidDataLoadGbn=L&hidOptStr=&hidMetro=%BC%AD%BF%EF&hidCity=%B0%AD%B3%B2%B1%B8&hidDong=&hidMapX=0&hidMapY=0&page=1&totalcount=2132&hidContractId=&LineAdNo=&strMP=&BestInfoCnt=0&reCommonCnt=0&BestInfo_Goods=0&isFirst=0&hidSearchGbn=area&selMetro=%BC%AD%BF%EF&selCity=%B0%AD%B3%B2%B1%B8&selDong=&txtSearch=&selRadius=00&selPriceMin=10&selPriceMax=000&selRoomCnt=000&selBathRoom=000&Premium=0-0&wolamount=0-0&Pyeng=&intMilli=&intPyeng="
homes <- read_html(url)

apt_name <- homes %>%
  html_nodes(".address .elip") %>%
  html_text()

num_rooms <- homes %>%
  html_nodes("#spCommonList td:nth-child(5)") %>%
  html_text()

price <- homes %>%
  html_nodes(".line .price") %>%
  html_text()

results <- bind_cols(apt_name = apt_name, num_rooms = num_rooms, price = price)
results

# A tibble: 30 x 3
#         apt_name num_rooms          price
#            <chr>     <chr>          <chr>
#  1   개포우성1차         4    전세120,000
#  2 개포주공1단지         3    매매166,000
#  3          삼성         3    매매100,000
#  4       한양1차         3    매매170,000
#  5       한양6차         3    매매188,000
#  6       현대8차         3    매매184,000
#  7       한양1차         3    매매150,000
#  8          동현         3     전세52,000
#  9       한양6차         3     전세55,000
# 10 우정에쉐르III         3 월세10,000/185
# ... with 20 more rows