我需要一个库来从文件URL(Direct Download Link)
中提取文件的全名。我想要一个功能强大的图书馆。我使用FileNameUtils
中的Apache commons
,但是此类不支持很多URLs
。
我想要一个支持这些Urls的库:
https://example.cdn.com/mp4/7/9/5/file_795f32460d111df334849ee8336e56ca.mp4?e=1535545105&h=4772d27a70cd9b1c665b712f62592c47&download=1
名称:file_795f32460d111df334849ee8336e56ca.mp4
http://example.cdn.comr/post/93/3/Jozve-Kamele-arbi.abp.zip
名称:Jozve-Kamele-arbi.abp.zip
http://cdl.example.com/?b=dl-software&f=Windows.8.1.Enterprise.x86.Aug.2018_n.part1.rar
名称:dl-software&f = Windows.8.1.Enterprise.x86.Aug.2018_n.part1.rar
https://www.google.com/url?sa=t&source=web&rct=j&url=http://www.pdf995.com/samples/pdf.pdf&ved=2ahUKEwjV096X-ZHdAhVQzlkKHTpUBV4QFjAAegQIARAB&usg=AOvVaw3HFvAQ7GNf5QjsUo05ot-j
名称:pdf.pdf
有人可以帮助我吗?谢谢。
如果我的句子语法不正确,我事先表示歉意。因为我的英语说得不好。
答案 0 :(得分:1)
我使用此方法,希望对您有帮助。它将从问号进行解析,也将进行哈希处理。
public static String parseFileNameFromUrl(String url) {
if (url == null) {
return "";
}
try {
URL res = new URL(url);
String resHost = res.getHost();
if (resHost.length() > 0 && url.endsWith(resHost)) {
// handle ...example.com
return "";
}
} catch (MalformedURLException e) {
e.printStackTrace();
return "";
}
int startIndex = url.lastIndexOf('/') + 1;
int length = url.length();
// find end index for ?
int lastQuestionMarkPos = url.lastIndexOf('?');
if (lastQuestionMarkPos == -1) {
lastQuestionMarkPos = length;
}
// find end index for #
int lastHashPos = url.lastIndexOf('#');
if (lastHashPos == -1) {
lastHashPos = length;
}
// calculate the end index
int endIndex = Math.min(lastQuestionMarkPos, lastHashPos);
return url.substring(startIndex, endIndex);
}
答案 1 :(得分:1)
如果您有感兴趣的文件扩展名列表,您实际上还可以尝试使用正则表达式(例如(?i)([^=/&?]+\\.(" + EXTENSIONS + "))\\b
)解决此问题。
以下是从URL提取文件的这种方法的示例:
private static final String EXTENSIONS = "ez|aw|atom|atomcat|atomsvc|ccxml|cdmia|cdmic|cdmid|cdmio|cdmiq|cu|davmount|dbk|dssc|xdssc|ecma|emma|epub|exi|pfr|gml|gpx|gxf|stk|ipfix|jar|ser|class|js|json|jsonml|lostxml|hqx|cpt|mads|mrc|mrcx|mathml|mbox|mscml|metalink|meta4|mets|mods|mp4s|mp4|mxf|oda|opf|ogx|omdoc|oxps|xer|pdf|pgp|prf|p10|p7s|p8|ac|cer|crl|pkipath|pki|pls|cww|pskcxml|rdf|rif|rnc|rl|rld|rs|gbr|mft|roa|rsd|rss|rtf|sbml|scq|scs|spq|spp|sdp|setpay|setreg|shf|rq|srx|gram|grxml|sru|ssdl|ssml|tfi|tsd|plb|psb|pvb|tcap|pwn|aso|imp|acu|air|fcdt|xdp|xfdf|ahead|azf|azs|azw|acc|ami|apk|cii|fti|atx|mpkg|m3u8|swi|iota|aep|mpm|bmi|rep|cdxml|mmd|cdy|cla|rp9|c11amc|c11amz|csp|cdbcmsg|cmc|clkx|clkk|clkp|clkt|clkw|wbs|pml|ppd|car|pcurl|dart|rdz|fe_launch|dna|mlp|dpg|dfac|kpxx|ait|svc|geo|mag|nml|esf|msf|qam|slt|ssf|ez2|ez3|fdf|mseed|gph|ftc|fnc|ltf|fsc|oas|oa2|oa3|fg5|bh2|ddd|xdw|xbd|fzs|txd|ggb|ggt|gxt|g2w|g3w|gmx|kml|kmz|gac|ghf|gim|grv|gtm|tpl|vcg|hal|zmm|hbci|les|hpgl|hpid|hps|jlt|pcl|pclxl|sfd-hdstx|mpy|irm|sc|igl|ivp|ivu|igm|i2g|qbo|qfx|rcprofile|irp|xpr|fcs|jam|rms|jisp|joda|karbon|chrt|kfo|flw|kon|ksp|htke|kia|sse|lasxml|lbd|lbe|123|apr|pre|nsf|org|scm|lwp|portpkg|mcd|mc1|cdkey|mwf|mfm|flo|igx|mif|daf|dis|mbk|mqy|msl|plc|txf|mpn|mpc|xul|cil|cab|xlam|xlsb|xlsm|xltm|eot|chm|ims|lrm|thmx|cat|stl|ppam|pptm|sldm|ppsm|potm|docm|dotm|wpl|xps|mseq|mus|msty|taglet|nlu|nnd|nns|nnw|ngdat|n-gage|rpst|rpss|edm|edx|ext|odc|otc|odb|odf|odft|odg|otg|odi|oti|odp|otp|ods|ots|odt|odm|ott|oth|xo|dd2|oxt|pptx|sldx|ppsx|potx|xlsx|xltx|docx|dotx|mgp|dp|esa|paw|str|ei6|efif|wg|plf|pbd|box|mgz|qps|ptid|bed|mxl|musicxml|cryptonote|cod|rm|rmvb|link66|st|see|sema|semd|semf|ifm|itp|iif|ipk|mmf|teacher|dxp|sfs|sdc|sda|sdd|smf|sgl|smzip|sm|sxc|stc|sxd|std|sxi|sti|sxm|sxw|sxg|stw|svd|xsm|bdm|xdm|tao|tmo|tpt|mxs|tra|utz|umj|unityweb|uoml|vcx|vis|vsf|wbxml|wmlc|wmlsc|wtb|nbp|wpd|wqd|stf|xar|xfdl|hvd|hvs|hvp|osf|osfpvg|saf|spf|cmp|zaz|vxml|wgt|hlp|wsdl|wspolicy|7z|abw|ace|dmg|aam|aas|bcpio|torrent|bz|vcd|cfs|chat|pgn|nsc|cpio|csh|dgc|wad|ncx|dtb|res|dvi|evy|eva|bdf|gsf|psf|pcf|snf|arc|spl|gca|ulx|gnumeric|gramps|gtar|hdf|install|iso|jnlp|latex|mie|application|lnk|wmd|wmz|xbap|mdb|obd|crd|clp|mny|pub|scd|trm|wri|nzb|p7r|rar|ris|sh|shar|swf|xap|sql|sit|sitx|srt|sv4cpio|sv4crc|t3|gam|tar|tcl|tex|tfm|obj|ustar|src|fig|xlf|xpi|xz|xaml|xdf|xenc|dtd|xop|xpl|xslt|xspf|yang|yin|zip|adp|s3m|sil|eol|dra|dts|dtshd|lvp|pya|ecelp4800|ecelp7470|ecelp9600|rip|weba|aac|caf|flac|mka|m3u|wax|wma|rmp|wav|xm|cdx|cif|cmdf|cml|csml|xyz|ttc|otf|ttf|woff|woff2|bmp|cgm|g3|gif|ief|ktx|png|btif|sgi|psd|sub|dwg|dxf|fbs|fpx|fst|mmr|rlc|mdi|wdp|npx|wbmp|xif|webp|3ds|ras|cmx|ico|sid|pcx|pnm|pbm|pgm|ppm|rgb|tga|xbm|xpm|xwd|dae|dwf|gdl|gtw|mts|vtu|appcache|css|csv|n3|dsc|rtx|tsv|ttl|vcard|curl|dcurl|mcurl|scurl|sub|fly|flx|gv|3dml|spot|jad|wml|wmls|java|nfo|opml|etx|sfv|uu|vcs|vcf|3gp|3g2|h261|h263|h264|jpgv|ogv|dvb|fvt|pyv|viv|webm|f4v|fli|flv|m4v|mng|vob|wm|wmv|wmx|wvx|avi|movie|smv|ice";
private static final Pattern FILE_DETECT = Pattern.compile("(?i)([^=/&?]+\\.(" + EXTENSIONS + "))\\b");
public static Optional<String> extractFileFrom(String url) {
Matcher matcher = FILE_DETECT.matcher(url);
return (matcher.find()) ? Optional.of(matcher.group(1)) : Optional.empty();
}
这是一个演示如何使用上述方法的测试:
public static void main(String[] args) throws ParseException {
List<String> strings = Arrays.asList(
"https://example.cdn.com/mp4/7/9/5/file_795f32460d111df334849ee8336e56ca.mp4?e=1535545105&h=4772d27a70cd9b1c665b712f62592c47&download=1",
"http://example.cdn.comr/post/93/3/Jozve-Kamele-arbi.abp.zip",
"http://cdl.example.com/?b=dl-software&f=Windows.8.1.Enterprise.x86.Aug.2018_n.part1.rar",
"https://www.google.com/url?sa=t&source=web&rct=j&url=http://www.pdf995.com/samples/pdf.pdf&ved=2ahUKEwjV096X-ZHdAhVQzlkKHTpUBV4QFjAAegQIARAB&usg=AOvVaw3HFvAQ7GNf5QjsUo05ot-j",
"https://www.google.com/url?sa=t&source=web&rct=j&url=http://www.pdf995.com/samples/pdf.PDF&ved=2ahUKEwjV096X-ZHdAhVQzlkKHTpUBV4QFjAAegQIARAB&usg=AOvVaw3HFvAQ7GNf5QjsUo05ot-j");
strings.stream().map(s -> extractFileFrom(s)).collect(Collectors.toList())
.forEach(System.out::println);
}
如果执行main方法,您将在控制台上看到它:
Optional[file_795f32460d111df334849ee8336e56ca.mp4]
Optional[Jozve-Kamele-arbi.abp.zip]
Optional[Windows.8.1.Enterprise.x86.Aug.2018_n.part1.rar]
Optional[pdf.pdf]
Optional[pdf.PDF]