我从网站上提取公司信息;以company
,address
和phone
作为类的span标记。网站上的每个页面包含5家公司。
有时address
或phone
不可用,我应该#EANF#
作为提取。但相反,我得到了即将到来的公司的提取,而#EANF#
正被推到了下面。
例如,我应该得到这个:
Company name 1, Company adress 1, Company phone 1
Company name 2, #EANF#, Company phone 2
Company name 3, Company adress 3, #EANF#
Company name 4, Company adress 4, Company phone 4
Company name 5, Company adress 5, Company phone 5
但我明白了:
Company name 1, Company adress 1, Company phone 1
Company name 2, Company adress 3, Company phone 2
Company name 3, Company adress 4, Company phone 4
Company name 4, Company adress 5, Company phone 5
Company name 5, #EANF#, #EANF#
这是我的iMacros代码:
VERSION BUILD=8810214 RECORDER=FX
TAB T=1
SET !DATASOURCE pages.csv
SET !DATASOURCE_COLUMNS 1
' pages.csv contains
' http://website.com/page1
' http://website.com/page2
' http://website.com/page3
' etc..
SET !LOOP 1
SET !DATASOURCE_LINE {{!LOOP}}
SET !ERRORIGNORE YES
SET !TIMEOUT_STEP 0
URL GOTO={{!COL1}}
WAIT SECONDS=2
TAG XPATH="(/html//span[@class='company']/a)[1]" EXTRACT=HTM
TAG XPATH="(/html//span[@class='address'])[1]" EXTRACT=HTM
TAG XPATH="(/html//span[@class='phone'])[1]" EXTRACT=HTM
ADD !VAR1 <BR>
SAVEAS TYPE=EXTRACT FOLDER=* FILE=saved_data_{{!NOW:ddmmyyyy}}.csv
SET !EXTRACT NULL
TAG XPATH="(/html//span[@class='company']/a)[2]" EXTRACT=HTM
TAG XPATH="(/html//span[@class='address'])[2]" EXTRACT=HTM
TAG XPATH="(/html//span[@class='phone'])[2]" EXTRACT=HTM
ADD !VAR1 <BR>
SAVEAS TYPE=EXTRACT FOLDER=* FILE=saved_data_{{!NOW:ddmmyyyy}}.csv
SET !EXTRACT NULL
TAG XPATH="(/html//span[@class='company']/a)[3]" EXTRACT=HTM
TAG XPATH="(/html//span[@class='address'])[3]" EXTRACT=HTM
TAG XPATH="(/html//span[@class='phone'])[3]" EXTRACT=HTM
ADD !VAR1 <BR>
SAVEAS TYPE=EXTRACT FOLDER=* FILE=saved_data_{{!NOW:ddmmyyyy}}.csv
SET !EXTRACT NULL
TAG XPATH="(/html//span[@class='company']/a)[4]" EXTRACT=HTM
TAG XPATH="(/html//span[@class='address'])[4]" EXTRACT=HTM
TAG XPATH="(/html//span[@class='phone'])[4]" EXTRACT=HTM
ADD !VAR1 <BR>
SAVEAS TYPE=EXTRACT FOLDER=* FILE=saved_data_{{!NOW:ddmmyyyy}}.csv
SET !EXTRACT NULL
TAG XPATH="(/html//span[@class='company']/a)[5]" EXTRACT=HTM
TAG XPATH="(/html//span[@class='address'])[5]" EXTRACT=HTM
TAG XPATH="(/html//span[@class='phone'])[5]" EXTRACT=HTM
ADD !VAR1 <BR>
SAVEAS TYPE=EXTRACT FOLDER=* FILE=saved_data_{{!NOW:ddmmyyyy}}.csv
SET !EXTRACT NULL