从一系列数据中提取imacros

时间:2015-07-03 04:27:38

标签: extract imacros extraction text-extraction data-extraction

嗨,这是我的网页看起来像

  <div class="Bango 1 Beamer Beamer-1"> Beamer </div>
  <div class ="menu1"> menu1 </div>
  <div class ="menu2"> menu2 </div>
  <div class ="menu3"> menu3 </div>
  <div class ="menu4"> menu4 </div>

 <div class="Bango 1 Beamer Beamer-2"> Beamer2 </div>
 <div class ="menu1"> menu21 </div>
 <div class ="menu2"> menu22 </div>
 <div class ="menu3"> menu23 </div>
 <div class ="menu4"> menu24 </div>

 <div class="Bango 1 Beamer Beamer-3"> Beamer3 </div>
 <div class ="menu1"> menu31 </div>
 <div class ="menu2"> menu32 </div>
 <div class ="menu3"> menu33 </div>
 <div class ="menu4"> menu34 </div>

如何仅提取Beamer-1下的元素?请注意,此组下的元素数量也可能随时变化。感谢

1 个答案:

答案 0 :(得分:1)

我建议使用许多伪URL解决此问题:

' get bounds
URL GOTO=javascript:{var<SP>doc=window.document;var<SP>els=doc.getElementsByTagName("div");for(i=0;i<els.length;i++){var<SP>b=(els[i].outerHTML.match("Beamer-1"))<SP>?<SP>(i+1)<SP>:<SP>b;var<SP>e=(els[i].outerHTML.match("Beamer-2"))<SP>?<SP>i<SP>:<SP>e;}}
' set extract
URL GOTO=javascript:{var<SP>ext="";for(i=b;i<e;i++){ext+=els[i].innerHTML.trim()+((i==e-1)<SP>?<SP>""<SP>:<SP>"[EXTRACT]");}underfined;}
' create dummy element
URL GOTO=javascript:{var<SP>elt=doc.createElement("input");elt.type="hidden";elt.id="myHiddenExtract";elt.value=ext;doc.getElementsByTagName("html")[0].appendChild(elt);underfined;}
' get extract
TAG POS=1 TYPE=INPUT ATTR=ID:myHiddenExtract EXTRACT=TXT
' remove dummy element
URL GOTO=javascript:{doc.getElementsByTagName("html")[0].removeChild(doc.getElementById("myHiddenExtract"));underfined;}