Google表格Regex与importxml结合使用以限制我得到的内容

时间:2017-09-22 02:23:08

标签: google-sheets

我正在制作一张用于刮网的zillow。我已经将几个部分聚集在一起以获得我需要的东西但是我遇到了一部分问题。

将其粘贴到A1

https://www.zillow.com/homes/fsbo/Tulsa-OK-74136/90319_rid/36.079662,-95.899444,36.0415,-95.991712_rect/13_zm/0_mmm/

将其粘贴到B1

=importXML(A1,"//a/@href")

然后在这一页中列出了一长串网址。但我正在寻找的是下面的内容。

...
...
#
/homedetails/3435-E-64th-St-Tulsa-OK-74136/22113164_zpid/
/myzillow/UpdateFavorites.htm?zpid=22113164&operation=add&ajax=false
/homedetails/7747-S-Fulton-Pl-Tulsa-OK-74136/2092972797_zpid/
/myzillow/UpdateFavorites.htm?zpid=2092972797&operation=add&ajax=false
/homedetails/4324-E-67th-St-UNIT-676-Tulsa-OK-74136/22229329_zpid/
/myzillow/UpdateFavorites.htm?zpid=22229329&operation=add&ajax=false
/homedetails/7801-S-Louisville-Ave-Tulsa-OK-74136/22227172_zpid/
/myzillow/UpdateFavorites.htm?zpid=22227172&operation=add&ajax=false
/homedetails/1612-E-66th-St-Tulsa-OK-74136/22129877_zpid/
/myzillow/UpdateFavorites.htm?zpid=22129877&operation=add&ajax=false
/homedetails/5503-E-73rd-St-Tulsa-OK-74136/22145899_zpid/
/myzillow/UpdateFavorites.htm?zpid=22145899&operation=add&ajax=false
/homedetails/7401-S-Yale-Ave-Tulsa-OK-74136/2101861353_zpid/
/myzillow/UpdateFavorites.htm?zpid=2101861353&operation=add&ajax=false
/homedetails/6508-S-Troost-Ave-Tulsa-OK-74136/22129854_zpid/
/myzillow/UpdateFavorites.htm?zpid=22129854&operation=add&ajax=false
/homedetails/7829-S-Evanston-Ave-Tulsa-OK-74136/22227977_zpid/
/myzillow/UpdateFavorites.htm?zpid=22227977&operation=add&ajax=false
/homedetails/7531-S-Irvington-Ave-Tulsa-OK-74136/2096103489_zpid/
/myzillow/UpdateFavorites.htm?zpid=2096103489&operation=add&ajax=false
/homedetails/1104-E-61st-St-Tulsa-OK-74136/2093334302_zpid/
/myzillow/UpdateFavorites.htm?zpid=2093334302&operation=add&ajax=false
/homedetails/1339-E-67th-St-Tulsa-OK-74136/22114998_zpid/
/myzillow/UpdateFavorites.htm?zpid=22114998&operation=add&ajax=false
/homedetails/7919-S-Braden-Ave-Tulsa-OK-74136/22235368_zpid/
/myzillow/UpdateFavorites.htm?zpid=22235368&operation=add&ajax=false
/homedetails/7014-S-Birmingham-Ct-Tulsa-OK-74136/22168356_zpid/
/myzillow/UpdateFavorites.htm?zpid=22168356&operation=add&ajax=false
/homedetails/7733-S-Hudson-Ave-Tulsa-OK-74136/22236219_zpid/
/myzillow/UpdateFavorites.htm?zpid=22236219&operation=add&ajax=false
#saved-search-lightbox

我想要所有的/homedetails /..._zpid / 它可以是一个数组,所以它们都排在同一列中,这样就可以了。我相信REGEX会这样做,但我找不到我需要的东西。任何人都可以帮忙吗?

我的表格中有这些。我不能让这个按照我想要的方式工作。

=ArrayFormula(IfError((QUERY(QUERY(IFERROR(IF({1,1,0},IF({1,0,0},INT((ROW($B:B)-1)/20),MOD(ROW($B:B)-1,20)),importXML($B:$B,"//meta[@property='og:zillow_fb:address']/@content | //meta[@property='product:price:amount']/@content| //div[@class='hdp-fact-ataglance-value'] | //span[@class='contact-badge Listing Agent'] "))),"select min(Col3) where Col3 <> '' group by Col1 pivot Col2",0),"offset 1",0)),""))
=If($B6:B="","",Transpose(importxml($B6:$B,"//span[@class='snl phone']")))

1 个答案:

答案 0 :(得分:1)

如何修改importXML()的XPath查询?我知道你想要问/homedetails/..._zpid/。这是对的吗?

修改过的XPath查询:

=importXML(A1,"//ul[@class='photo-cards']//a[@class='zsg-photo-card-overlay-link routable hdp-link routable mask hdp-link']/@href")

结果:

enter image description here

注意:

如果您想要/homedetails/.../myzillow/...的链接,请使用=importXML(A1,"//ul[@class='photo-cards']//a/@href")

如果我误解了你的问题,我很抱歉。