I am working on a Java method that differentiates between absolute and relative URLs the way a browser address bar would rather than the way a strict URL parser would. That is, I want it to recognize a URL as absolute if it starts with a host,
whether or not the scheme is present. That way, it correctly recognizes scheme-relative URLs (like //example.com
) and URLs with the scheme completely omitted (like example.com
, wikipedia.org
, lots.and-lots.of.domains.com.ng
). The method I', currently using looks something like this
public String checkPossiblyAbsolute(String url) {
if (url.matches("^(\\/\\/)?([-_A-Za-z0-9]+\\.)+\\w{2,3}(\\/.*)?$")) {
if (url.startsWith("//")) url = "http:" + url;
else url = "http://" + url;
}
return url;
}
Basically, it checks for dot separated sequences of the characters A-Z
, a-z
, 0-9
, -
, and _
where the last sequence (the TLD) contains exactly 2 or 3 letters. Also, the string may start with an optional //
. My tests work the way I expected, but I really want to find an easier (or at least more readable) way to do this. Any thoughts?
答案 0 :(得分:0)
Unfortunately Java does not allow you to avoid double escaping things. (Some languages allow @"une\scapedRegex"
).
There are some modifications you can make to the regex syntax, however.
\\.
can become [.]
Not shorter, but IMHO more readable.\\/
. Make it [/]
.A-Z
if you use case insensitive mode. May not be worth it when you have only one A-Z
.There's not much more you can do, except put things in variables. Again, may not be worth it if you have only a few redundancies, but it could improve readability. You're using Java, so you're not winning code-golf, anyways.