Question

我正在解析电子邮件数据的页面。我如何获得隐藏的电子邮件 - 这是使用JavaScript生成的。这是我正在解析页面的页面如果您要查看html源代码（使用firebug或其他内容），您会看到它是在div中生成的名为sobi2Details_field_email的链接标记，并设置为display：none。这是我现在的代码，但问题在于电子邮件

 doc = Jsoup.connect(strLine).get();
 Element e5=doc.getElementById("sobi2Details_field_email");

if(e5!=null)
 {
 emaildata=e5.child(1).absUrl("href").toString();

 }
  System.out.println (emaildata);

Answer 1

您需要执行几个步骤，因为Jsoup不允许您执行JavaScript。我对它进行了逆向设计，这就是出现的结果：

public static void main(final String[] args) throws IOException
{
    final String url = "http://poslovno.com/kategorije.html?sobi2Task=sobi2Details&catid=71&sobi2Id=20001";
    final Document doc = Jsoup.connect(url).get();
    final Element e5 = doc.getElementById("sobi2Details_field_email");

    System.out.println("--- this is how we start");
    System.out.println(e5 + "\n\n\n\n");

    // remove the xml encoding
    System.out.println("---Remove XML encoding\n");
    String email = org.jsoup.parser.Parser.unescapeEntities(e5.toString(), false);
    System.out.println(email + "\n\n\n\n");

    // remove the concatunation with ' + '
    System.out.println("--- Remove concatunation (all: ' + ')");
    email = email.replaceAll("' \\+ '", "");
    System.out.println(email + "\n\n\n\n");

    // extract the email address variables
    System.out.println("--- Remove useless lines");
    Matcher matcher = Pattern.compile("var addy.*var addy", Pattern.MULTILINE + Pattern.DOTALL).matcher(email);
    matcher.find();

    email = matcher.group();
    System.out.println(email + "\n\n\n\n");

    // get the to string enclosed by '' and concatunate
    System.out.println("--- Extract the email address");
    matcher = Pattern.compile("'(.*)'.*'(.*)'", Pattern.MULTILINE + Pattern.DOTALL).matcher(email);
    matcher.find();

    email = matcher.group(1) + matcher.group(2);
    System.out.println(email);

}

Answer 2

如果在服务器响应完成后，客户端使用javascript动态生成某些内容，则除此之外没有其他方法：

逆向工程 - 弄清楚服务器端脚本的作用，并尝试实现相同的行为
从已处理的页面下载javascript，并使用java的javascript处理器执行此类脚本并获取结果（是的，有可能，我被迫做这样的事情）。Here you have基本示例显示如何评估javascript在java。

Jsoup收到隐藏的电子邮件

2 个答案: