im尝试使用以下代码从以下地址http://www.dolarhoy.com/获取值:
try {
URL urlPagina = new URL(url);
URLConnection urlConexion = urlPagina.openConnection();
urlConexion.connect();
// Creamos el objeto con el que vamos a leer
BufferedReader lector = new BufferedReader(new InputStreamReader(
urlConexion.getInputStream(), "UTF-8"));
String linea = "";
String contenido = "";
while ((linea = lector.readLine()) != null) {
resultado.append(String.valueOf(linea));
resultado.append("\n");
}
} catch (Exception e) {
e.printStackTrace();
}
System.out.println("Contenido : \n\n" + resultado.toString());
return resultado.toString();
}
我在其他代码之间得到了这个
<td width='113' height='25'>
<div align='center'>
<font face='Verdana, Arial, Helvetica, sans-serif' color='#00ff00' size='2'>ACTUALIZADO</font>
</div>
</td>
<td width='179' height='25'>
<div align='center'>
<font face='Verdana, Arial, Helvetica, sans-serif' color='#00ff00' size='2'><b>7/08/2018
14:53 AR</b></font>
</div>
</td>
<td width='82' height='25'>
<div align='center'>
<font face='Verdana, Arial, Helvetica, sans-serif' color='#00ff00' size='2'>COMPRA</font>
</div>
</td>
<td width='110' height='25'>
<div align='center'>
<font face='Verdana, Arial, Helvetica, sans-serif' color='#000000' size='2'><b><font face='Courier New, Courier, mono' color='#FFCC00' size='4'>$
26.93</font></b></font>
</div>
</td>
<td width='85' height='25'>
<div align='center'>
<font face='Verdana, Arial, Helvetica, sans-serif' color='#00ff00' size='2'>VENTA</font>
</div>
</td>
<td width='110' height='25'>
<div align='center'>
<font face='Verdana, Arial, Helvetica, sans-serif' color='#000000' size='2'><b><font face='Courier New, Courier, mono' color='#FFCC00' size='4'>$
27.93</font></b></font>
</div>
</td>
但是我看到html表没有id。
我需要获取的值是图像中突出显示的值。
我需要上面的html代码“ 27.93”中显示的值。 (此值各不相同,因此我需要标记之间的内容)
我非常感谢您的帮助/解决方案。谢谢!
答案 0 :(得分:0)
Firefox可以为该元素提供XPath或CSS选择器,这是该值的XPath:
/html/body/div[5]/center/table/tbody/tr/td[6]/div/font/b/font
使用您选择的XPath库提取值。
这是可以与JSOUP一起使用的CSS选择器
/body > div:nth-child(7) > center:nth-child(1) > table:nth-child(1) > tbody:nth-child(1) > tr:nth-child(1) > td:nth-child(6) > div:nth-child(1) > font:nth-child(1) > b:nth-child(1) > font:nth-child(1)
答案 1 :(得分:0)
使用jsoup伪选择器,您可以执行以下操作:
Document doc = Jsoup.connect("http://www.dolarhoy.com/").get();
//select div element that contains specific text and is direct descenant of body
Element title = doc.select("body > div:contains(PROMEDIO DE COTIZACIONES DE PIZARRAS AL PÚBLICO RELEVADAS POR)").first();
//select next sibling element with summary
Element summary = title.nextElementSibling();
//select last cell with data needed
String amount = summary.select("td").last().text();
System.out.println(amount);
//same as above - one-liner
System.out.println(doc.select("body > div:contains(PROMEDIO DE COTIZACIONES DE PIZARRAS AL PÚBLICO RELEVADAS POR) + div td:last-child").text());
更多信息可以在这里找到: https://jsoup.org/cookbook/extracting-data/selector-syntax
答案 2 :(得分:0)
使用univocity-html-parser,您可以从此页面获取所有内容。
只需获取您需要的元素,就不必太在乎它的完整路径:
HtmlElement e = HtmlParser.parseTree(new UrlReaderProvider("http://www.dolarhoy.com/"));
String value = e.query()
.match("td").withText("$*") //match a <td> with any text starting with a $
.precededImmediatelyBy("td").withText("VENTA") //if found, it must have a <td> on its left, with text "VENTA"
.getText().getValue(); // if found, get the text of the the <td> and return the value as a String
这为我带来了$ 28.17
的价值。
HtmlEntityList entityList = new HtmlEntityList();
HtmlEntitySettings currency = entityList.configureEntity("currency");
// removes rows with unwanted data
currency.addRecordFilter((record, context) -> isValidRecord(record));
//the group enables the matching rules to run only on tables that have text
//"compra" and "venta". We add fields to the group.
Group currencyTable = currency.newGroup().startAt("table").containing("tr").withText("*Compra ", "*Venta ").endAtClosing("table");
//the currency name and time are in the same table cell. The matching rule is the same for both "currency" and "timestamp" fields
addIdentifierField(currencyTable, "currency", 0);
addIdentifierField(currencyTable, "timestamp", 1);
//captures the currency exchange business name
currencyTable.addPersistentField("exchange").match("td").underHeaderAtRow("td", 3).withExactText("EN $").getText();
//captures the currency purchase and sale price
currencyTable.addField("buy").match("td").withText("?*").underHeaderAtRow("td", 3).withExactText("Compra").getText();
currencyTable.addField("sell").match("td").withText("?*").underHeaderAtRow("td", 3).withExactText("Venta").getText();
//additional matching rules to get the dollar prices listed in the first table (it has id = "table2")
currencyTable.addPersistentField("exchange").match("table").id("table2").match("tr").matchFirst("td").withText("?*").getText();
currencyTable.addField("buy").match("table").id("table2").match("td").withText("?*").underHeader("td").withExactText("Compra").getText();
currencyTable.addField("sell").match("table").id("table2").match("td").withText("?*").underHeader("td").withExactText("Venta").getText();
HtmlParser parser = new HtmlParser(entityList);
Results<HtmlParserResult> results = parser.parse(new UrlReaderProvider("http://www.dolarhoy.com/"));
HtmlParserResult result = results.get("currency");
for (HtmlRecord record : result.iterateRecords()) {
println(record.fillFieldMap(new LinkedHashMap<String, String>()));
}
方法addIdentifierField
定义为:
private void addIdentifierField(Group table, String field, final int pos) {
//matches any <td> where the colspan attribute is 4, 5 or 6, then gets the text of the <b> element inside the <td>
table.addPersistentField(field).match("td").attribute("colspan", 4, 5, 6).match("b").getText().transform(s -> splitCurrencyAndTime(s)[pos]);
}
方法splitCurrencyAndTime
:
// splits the currency and timestamp at the top of each table. Finds the first
// non-letter character after counting multiple whitespaces and splits the string in two
private String[] splitCurrencyAndTime(String value) {
int spaceCount = 0;
for (int i = 0; i < value.length(); i++) {
char ch = value.charAt(i);
if (ch == ' ') {
spaceCount++;
} else if (spaceCount > 0 && !Character.isLetter(ch) && ch != '$') {
String currency = value.substring(0, i).trim();
String timestamp = value.substring(i).trim();
return new String[]{currency, timestamp};
}
}
//if no match then just return nulls
return new String[2];
}
最后,方法isValidRecord
防止摆脱诸如{currency=EURO, timestamp=15:35:39 HS. AR 10/08/18, exchange=MEJORES PRECIOS, buy=34.000, sell=34.702}
之类的结果:
private boolean isValidRecord(Record record){
String exchange = record.getString("exchange");
return exchange != null && !exchange.contains("MEJORES") && !exchange.contains("DolarHoy.com");
}
输出将是:
{currency=DÓLAR ESTADOUNIDENSE EN $, timestamp=15:35:39 HS. AR 10/08/18, exchange=Cambio Alpe, buy=28.500, sell=29.500}
{currency=DÓLAR ESTADOUNIDENSE EN $, timestamp=15:35:39 HS. AR 10/08/18, exchange=Banco Nación, buy=28.700, sell=29.700}
{currency=DÓLAR ESTADOUNIDENSE EN $, timestamp=15:35:39 HS. AR 10/08/18, exchange=Montevideo Cambio, buy=28.000, sell=29.300}
{currency=EURO, timestamp=15:35:39 HS. AR 10/08/18, exchange=Banco Nación, buy=34.000, sell=35.000}
{currency=EURO, timestamp=15:35:39 HS. AR 10/08/18, exchange=Cambio Alpe S.A., buy=33.502, sell=34.702}
{currency=EURO, timestamp=15:35:39 HS. AR 10/08/18, exchange=Montevideo Cambio S.A., buy=31.400, sell=35.200}
{currency=REAL, timestamp=15:35:39 HS. AR 10/08/18, exchange=Banco Nación, buy=7.0000, sell=8.0000}
{currency=REAL, timestamp=15:35:39 HS. AR 10/08/18, exchange=Cambio Alpe S.A., buy=6.8000, sell=7.4000}
{currency=REAL, timestamp=15:35:39 HS. AR 10/08/18, exchange=Montevideo Cambio S.A., buy=6.6000, sell=7.5000}
{currency=PESO URUGUAYO, timestamp=15:35:39 HS. AR 10/08/18, exchange=Cambio Alpe S.A., buy=0.89060, sell=1.01720}
{currency=PESO URUGUAYO, timestamp=15:35:39 HS. AR 10/08/18, exchange=Montevideo Cambio S.A., buy=0.75000, sell=1.00000}
{currency=PESO CHILENO, timestamp=15:35:39 HS. AR 10/08/18, exchange=Cambio Alpe S.A., buy=0.04250, sell=0.05180}
{currency=PESO CHILENO, timestamp=15:35:39 HS. AR 10/08/18, exchange=Montevideo Cambio S.A., buy=0.03600, sell=0.04600}
{currency=GUARANÍ, timestamp=15:35:39 HS. AR 10/08/18, exchange=Cambio Alpe S.A., buy=0.00440, sell=0.00590}
{currency=GUARANÍ, timestamp=15:35:39 HS. AR 10/08/18, exchange=Montevideo Cambio S.A., buy=0.00450, sell=0.00535}
{currency=FRANCO SUIZO, timestamp=15:35:39 HS. AR 10/08/18, exchange=Cambio Alpe S.A., buy=24.7826, sell=31.0526}
{currency=FRANCO SUIZO, timestamp=15:35:39 HS. AR 10/08/18, exchange=Montevideo Cambio S.A., buy=21.2000, sell=29.4000}
{currency=LIBRA ESTERLINA, timestamp=15:35:39 HS. AR 10/08/18, exchange=Cambio Alpe S.A., buy=38.7228, sell=41.7729}
{currency=LIBRA ESTERLINA, timestamp=15:35:39 HS. AR 10/08/18, exchange=Montevideo Cambio S.A., buy=35.4000, sell=44.3000}
{currency=YEN, timestamp=15:35:39 HS. AR 10/08/18, exchange=Montevideo Cambio S.A., buy=0.2456, sell=0.2745}
{currency=DÓLAR CANADIENSE, timestamp=15:35:39 HS. AR 10/08/18, exchange=Montevideo Cambio S.A., buy=18.950, sell=23.100}
{currency=PESO MEXICANO, timestamp=15:35:39 HS. AR 10/08/18, exchange=Montevideo Cambio S.A., buy=1.500, sell=1.930}
{currency=DÓLAR AUSTRALIANO, timestamp=15:35:39 HS. AR 10/08/18, exchange=Montevideo Cambio S.A., buy=15.150, sell=21.900}
{currency=LIBRA ORO, timestamp=15:35:39 HS. AR 10/08/18, exchange=Cambio Alpe S.A., buy=7267.50, sell=9292.50}
{currency=KRUGER RAND, timestamp=15:35:39 HS. AR 10/08/18, exchange=Cambio Alpe S.A., buy=30780.00, sell=39235.00}
{currency=CHILENO DE ORO, timestamp=15:35:39 HS. AR 10/08/18, exchange=Cambio Alpe S.A., buy=18097.50, sell=22715.00}
{currency=100 GRAMOS DE ORO, timestamp=15:35:39 HS. AR 10/08/18, exchange=Banco Ciudad, buy=null, sell=110636.00}
{currency=100 GRAMOS DE ORO, timestamp=15:35:39 HS. AR 10/08/18, exchange=Cambio Alpe S.A., buy=99180.00, sell=128325.00}
{currency=50 GRAMOS DE ORO, timestamp=15:35:39 HS. AR 10/08/18, exchange=Banco Ciudad, buy=null, sell=55473.00}
{currency=50 GRAMOS DE ORO, timestamp=15:35:39 HS. AR 10/08/18, exchange=Cambio Alpe S.A., buy=49590.00, sell=65195.00}
{currency=20 GRAMOS DE ORO, timestamp=15:35:39 HS. AR 10/08/18, exchange=Cambio Alpe S.A., buy=19807.50, sell=25812.50}
{currency=10 GRAMOS DE ORO, timestamp=15:35:39 HS. AR 10/08/18, exchange=Banco Ciudad, buy=null, sell=11343.00}
{currency=10 GRAMOS DE ORO, timestamp=15:35:39 HS. AR 10/08/18, exchange=Cambio Alpe S.A., buy=9975.00, sell=13275.00}
希望这对您有用。
披露:我是这个图书馆的作者。它是商业上的封闭源代码,但是可以节省很多开发时间。