我想要实现的是提取SMS中包含的所有URL。当我的意思是全部时,我的意思是所有可解决的文本,即SMS中的下划线。这是我正在尝试的代码并且它可以工作,但只有当URL以http / https / ftp开头时......我还需要获得没有它的URL。
public static List<String> extractUrls(String sms) {
List<String> containedUrls = new ArrayList<String>();
String text = sms;
// Split the sms to analyze if each part is a URL
String[] split = text.split(" ");
// Attempt to convert each item into an URL
for (int i = 0; i < split.length; i++) {
if (URLUtil.isValidUrl(split[i])) containedUrls.add(split[i]);
}
return containedUrls;
}
答案 0 :(得分:1)
您可以尝试使用Regex
public static List<String> extractUrls(String sms) {
List<String> containedUrls = new ArrayList<String>();
String text = sms;
// Split the sms to analyze if each part is a URL
String[] split = text.split(" ");
Pattern p = Pattern.compile("(@)?(href=')?(HREF=')?(HREF=\")?(href=\")?(http://)?[a-zA-Z_0-9\\-]+(\\.\\w[a-zA-Z_0-9\\-]+)+(/[#&\\n\\-=?\\+\\%/\\.\\w]+)?");
// Attempt to convert each item into an URL
for (int i = 0; i < split.length; i++) {
if (p.matcher(split[i]).matches()) containedUrls.add(split[i]);
}
return containedUrls;
}