我试图使用j汤从reddit源代码中提取用户名,然后尝试使用DM,但我不知道如何只提取说/ user的链接。对不起,这真是一团糟。
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import java.io.IOException;
public class Main
{
public static void main(String[] args) {
Document doc;
try {
// need http protocol
doc = Jsoup.connect("https://www.reddit.com/new/).get();
// get page title
String title = doc.title();
System.out.println("title : " + title);
// get all links
Document doc = Jsoup.parse("https://www.reddit.com/new");
Elements certainLinks =
doc.select("https://www.reddit.com/user"); {
// get the value from href attribute
{
System.out.println("\nlink : " + link.attr("href"));
System.out.println("text : " + link.text());
}
catch (IOException e) {
e.printStackTrace();
}
}
}
}
答案 0 :(得分:1)
像这样的Smth
Document doc = Jsoup.connect("https://www.reddit.com/new/").get();
Elements certainLinks = doc.select("a[href*=https://www.reddit.com/user/]");
certainLinks.forEach(l -> System.out.println(l.text()));
将打印:
_serial_chiller
dracorian
ImagesOfNetwork
...
a[href*=https://www.reddit.com/user/]
表示包含a
字符串的href
属性的所有https://www.reddit.com/user/
元素