iMacros:#EANF#在错误的行中提取

时间:2015-05-28 09:04:39

标签: firefox web-scraping imacros extraction

我从网站上提取公司信息;以companyaddressphone作为类的span标记。网站上的每个页面包含5家公司。

有时addressphone不可用,我应该#EANF#作为提取。但相反,我得到了即将到来的公司的提取,而#EANF#正被推到了下面。

例如,我应该得到这个:

Company name 1, Company adress 1, Company phone 1
Company name 2, #EANF#, Company phone 2
Company name 3, Company adress 3, #EANF#
Company name 4, Company adress 4, Company phone 4
Company name 5, Company adress 5, Company phone 5

但我明白了:

Company name 1, Company adress 1, Company phone 1
Company name 2, Company adress 3, Company phone 2
Company name 3, Company adress 4, Company phone 4
Company name 4, Company adress 5, Company phone 5
Company name 5, #EANF#, #EANF#

这是我的iMacros代码:

VERSION BUILD=8810214 RECORDER=FX
TAB T=1

SET !DATASOURCE pages.csv
SET !DATASOURCE_COLUMNS 1

' pages.csv contains

' http://website.com/page1
' http://website.com/page2
' http://website.com/page3
' etc..

SET !LOOP 1
SET !DATASOURCE_LINE {{!LOOP}}
SET !ERRORIGNORE YES
SET !TIMEOUT_STEP 0

URL GOTO={{!COL1}}

WAIT SECONDS=2

TAG XPATH="(/html//span[@class='company']/a)[1]" EXTRACT=HTM
TAG XPATH="(/html//span[@class='address'])[1]" EXTRACT=HTM
TAG XPATH="(/html//span[@class='phone'])[1]" EXTRACT=HTM
ADD !VAR1 <BR>
SAVEAS TYPE=EXTRACT FOLDER=* FILE=saved_data_{{!NOW:ddmmyyyy}}.csv
SET !EXTRACT NULL

TAG XPATH="(/html//span[@class='company']/a)[2]" EXTRACT=HTM
TAG XPATH="(/html//span[@class='address'])[2]" EXTRACT=HTM
TAG XPATH="(/html//span[@class='phone'])[2]" EXTRACT=HTM
ADD !VAR1 <BR>
SAVEAS TYPE=EXTRACT FOLDER=* FILE=saved_data_{{!NOW:ddmmyyyy}}.csv
SET !EXTRACT NULL

TAG XPATH="(/html//span[@class='company']/a)[3]" EXTRACT=HTM
TAG XPATH="(/html//span[@class='address'])[3]" EXTRACT=HTM
TAG XPATH="(/html//span[@class='phone'])[3]" EXTRACT=HTM
ADD !VAR1 <BR>
SAVEAS TYPE=EXTRACT FOLDER=* FILE=saved_data_{{!NOW:ddmmyyyy}}.csv
SET !EXTRACT NULL

TAG XPATH="(/html//span[@class='company']/a)[4]" EXTRACT=HTM
TAG XPATH="(/html//span[@class='address'])[4]" EXTRACT=HTM
TAG XPATH="(/html//span[@class='phone'])[4]" EXTRACT=HTM
ADD !VAR1 <BR>
SAVEAS TYPE=EXTRACT FOLDER=* FILE=saved_data_{{!NOW:ddmmyyyy}}.csv
SET !EXTRACT NULL

TAG XPATH="(/html//span[@class='company']/a)[5]" EXTRACT=HTM
TAG XPATH="(/html//span[@class='address'])[5]" EXTRACT=HTM
TAG XPATH="(/html//span[@class='phone'])[5]" EXTRACT=HTM
ADD !VAR1 <BR>
SAVEAS TYPE=EXTRACT FOLDER=* FILE=saved_data_{{!NOW:ddmmyyyy}}.csv
SET !EXTRACT NULL

0 个答案:

没有答案