Using importXML with a php populated website and scraping to googlesheets

时间:2018-08-22 13:59:04

标签: xml web-scraping google-sheets-query

I am trying to import data from this website using IMPORTXML into a googlesheet http://14.139.247.11/citywx/city_weather.php?id=42488

I want to scrape the data at Minimum Temp (oC)

Loading data from this table to sheets using IMPORTHTML works just fine using this =IMPORTHTML("http://14.139.247.11/citywx/city_weather.php?id=42488","table",2)

But trying to scrape using IMPORTXML is not working. Using chrome developer tools I copied the XML path and it shows to be

/html/body/center/font/table[1]/tbody/tr[1]/td[2]/table/tbody/tr[4]/td[1]/font

This returns:

NA ie.("Imported Content is Empty")

I copied the entire html on my server. The scraping works locally when I remove the bolded "font" from /html/body/center/font/table[1]/tbody/tr[1]/td[2]/table/tbody/tr[4]/td[1]/font and set it to: /html/body/center/table[1]/tbody/tr[1]/td[2]/table/tbody/tr[4]/td[1]/font

However it still fails on the original site.

It does not look like a case of dynamic website since importHTML works and I could not find any javascript running. What am I missing here?

1 个答案:

答案 0 :(得分:0)

您可以使用Index使用IMPORTHTML而不是IMPORTXML从导入的表中获取最低温度的行和列位置。源是HTML而不是XML。

=INDEX(IMPORTHTML("http://14.139.247.11/citywx/city_weather.php?id=42488","table",2),4,2)

表格:

Result