尝试使用Jsoup从Reddit中提取用户名

时间:2017-04-11 18:28:35

标签: java jsoup

我试图使用j汤从reddit源代码中提取用户名,然后尝试使用DM,但我不知道如何只提取说/ user的链接。对不起,这真是一团糟。

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

import java.io.IOException;

public class Main
{

public static void main(String[] args) {

    Document doc;
    try {

        // need http protocol
        doc = Jsoup.connect("https://www.reddit.com/new/).get();

        // get page title
        String title = doc.title();
        System.out.println("title : " + title);

        // get all links
        Document doc = Jsoup.parse("https://www.reddit.com/new");
        Elements certainLinks = 
        doc.select("https://www.reddit.com/user"); {

            // get the value from href attribute
            {
                System.out.println("\nlink : " + link.attr("href"));
                System.out.println("text : " + link.text());

            }


            catch (IOException e) {
            e.printStackTrace();
        }

      }

     }
  }

1 个答案:

答案 0 :(得分:1)

像这样的Smth

        Document doc = Jsoup.connect("https://www.reddit.com/new/").get();
        Elements certainLinks = doc.select("a[href*=https://www.reddit.com/user/]");
        certainLinks.forEach(l -> System.out.println(l.text()));

将打印:

_serial_chiller
dracorian
ImagesOfNetwork
... 

a[href*=https://www.reddit.com/user/]表示包含a字符串的href属性的所有https://www.reddit.com/user/元素