我正在构建一个java应用程序,使用xpath提取表标记内的值。
请建议我从页面获取所有200个值的有效方法。我的代码完全适用于第一个DataTable的100行。但是,我无法进入第二个dataTable。
我能够使用以下java类提取它们。
预期产量
http://a.com/ data for a 526735 Z
http://b.com/ data for b 522273 Z
.
.
.
.
http://c.com/ data for c 578335 Z
http://d.com/ data for d 513445 Z
<table>
<tbody>
<tr>
<td style="padding-right>
<table class = dataTabe>
<tbody>
<tr>
<td><a HREF="http://a.com/" target="_parent">data for a</a></td>
<td class="numericalColumn">526735</td>
<td class="numericalColumn">Z</td></tr>
<tr>
<td><a HREF="http://b.com/" target="_parent">data for b</a></td>
<td class="numericalColumn">522273</td>
<td class="numericalColumn">B</td></tr>
.
.
.100 <tr> here
.
</tbody>
</table>
</td>
<td style="padding-right>
<table class = dataTabe>
<tbody>
<tr>
<td><a HREF="http://c.com/" target="_parent">data for c</a></td>
<td class="numericalColumn">526735</td>
<td class="numericalColumn">Z</td></tr>
<tr>
<td><a HREF="http://d.com/" target="_parent">data for d</a></td>
<td class="numericalColumn">522273</td>
<td class="numericalColumn">B</td></tr>
.
.
.100 rows here
.
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
这是用于获取数据的类。
import java.io.BufferedReader;
import java.io.InputStream;
import org.w3c.tidy.*;
import java.io.InputStreamReader;
import java.net.URL;
import java.net.URLConnection;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.Iterator;
import java.util.Map;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.NodeList;
import org.w3c.tidy.Node;
import org.w3c.tidy.Tidy;
import org.w3c.tidy.Tidy;
public class CompaniesGetter {
public static void main(String[] args) throws Exception{
String name,link,scripcode,group,s,key;
int a=1;
int count=1;
URL oracle = new URL("http://money.rediff.com/companies");
URLConnection yc = oracle.openConnection();
InputStream is = yc.getInputStream();
is = oracle.openStream();
Tidy tidy = new Tidy();
tidy.setQuiet(true);
tidy.setShowWarnings(false);
Document tidyDOM = tidy.parseDOM(is, null);
XPathFactory xPathFactory = XPathFactory.newInstance();
XPath xPath = xPathFactory.newXPath();
Map<String,String> mLink=new HashMap<String,String>();
Map<String,String> mCode=new HashMap<String,String>();
Map<String,String> mGroup=new HashMap<String,String>();
ArrayList<String> aName=new ArrayList<String>();
//for(int j=0;j<2;j++)
for(int i =1;i<=200;i++)
{if(i==100)
{
a=2;
s=attrib[1];
}
link = "//table[@class='dataTable']/tbody/tr["+i+"]/td/a/@href";
name = "//table[@class='dataTable']/tbody/tr["+i+"]/td/a";
scripcode = "//table[@class='dataTable']/tbody/tr["+i+"]/td[2]";
group = "//table[@class='dataTable']/tbody/tr["+i+"]/td[3]";
String linkValue = (String)xPath.evaluate(link, tidyDOM, XPathConstants.STRING);
String nameValue = (String)xPath.evaluate(name, tidyDOM, XPathConstants.STRING);
String scripValue = (String)xPath.evaluate(scripcode, tidyDOM, XPathConstants.STRING);
String groupValue = (String)xPath.evaluate(group, tidyDOM, XPathConstants.STRING);
aName.add(nameValue);
mLink.put(nameValue, linkValue);
mCode.put(nameValue, scripValue);
mGroup.put(nameValue,groupValue);
}
Iterator<String> itr=aName.iterator();
while (itr.hasNext()){
key=itr.next();
System.out.println("::"+(count++)+" "+key + " "+mLink.get(key)+" "+mCode.get(key)+" "+mGroup.get(key)+" ::");
}
}
}
答案 0 :(得分:1)
嗯。只是一个提示:您是否在XPath中使用变量“a”?
link = "//table[@class='dataTable']/tbody/tr["+i+"]/td/a/@href";
应该是
link = "//table[@class='dataTable'][" + a + "]/tbody/tr["+i+"]/td/a/@href";