我正在尝试使用ESAPI验证来验证网址,但由于 hello How are you
,我的验证失败了。如果我删除语言参数,则验证成功。请检查我的网址格式,让我知道为什么这种模式失败。
&lang
Reg Expression
String url="http://google.com:000/menu.jsp?userid=test&age=22&language=hindi";
ESAPI.validator().getValidInput("URL", url,"URL",100000,false);
答案 0 :(得分:0)
这是设计的。这里发生的是您的输入被ESAPI的规范化方法捕获,该方法检测到URI包含BOTH URI编码和HTML实体编码。
ESAPI的下一个版本将包含一个方便的方法,但它需要几个步骤。
便利方法:
/**
* {@inheritDoc}
*/
public boolean isValidURI(String context, String input, boolean allowNull) {
boolean isValid = false;
URI compliantURI = this.getRfcCompliantURI(input);
try{
if(null != compliantURI){
String canonicalizedURI = getCanonicalizedURI(compliantURI);
//if getCanonicalizedURI doesn't throw an IntrusionException, then the URI contains no mixed or
//double-encoding attacks.
logger.info(Logger.SECURITY_SUCCESS, "We did not detect any mixed or multiple encoding in the uri:[" + input + "]");
Validator v = ESAPI.validator();
//This part will use the regex from validation.properties. This regex should be super-simple, and
//used mainly to restrict certain parts of a URL.
Pattern p = ESAPI.securityConfiguration().getValidationPattern( "URL" );
//We're doing this instead of using the normal validator API, because it will canonicalize the input again
//and if the URI has any queries that also happen to match HTML entities, like ¶
//it will cease conforming to the regex we now specify for a URL.
isValid = p.matcher(canonicalizedURI).matches();
}
}catch (IntrusionException e){
logger.error(Logger.SECURITY_FAILURE, e.getMessage());
isValid = false;
}
return isValid;
}
/**
* This does alot. This will extract each piece of a URI according to parse zone, and it will construct
* a canonicalized String representing a version of the URI that is safe to run regex against to it.
*
* @param dirtyUri
* @return
* @throws IntrusionException
*/
public String getCanonicalizedURI(URI dirtyUri) throws IntrusionException{
// From RFC-3986 section 3
// URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ]
//
// hier-part = "//" authority path-abempty
// / path-absolute
// / path-rootless
// / path-empty
// The following are two example URIs and their component parts:
//
// foo://example.com:8042/over/there?name=ferret#nose
// \_/ \______________/\_________/ \_________/ \__/
// | | | | |
// scheme authority path query fragment
// | _____________________|__
// / \ / \
// urn:example:animal:ferret:nose
Map<UriSegment, String> parseMap = new EnumMap<UriSegment, String>(UriSegment.class);
parseMap.put(UriSegment.SCHEME, dirtyUri.getScheme());
//authority = [ userinfo "@" ] host [ ":" port ]
parseMap.put(UriSegment.AUTHORITY, dirtyUri.getRawAuthority());
parseMap.put(UriSegment.SCHEMSPECIFICPART, dirtyUri.getRawSchemeSpecificPart());
parseMap.put(UriSegment.HOST, dirtyUri.getHost());
//if port is undefined, it will return -1
Integer port = new Integer(dirtyUri.getPort());
parseMap.put(UriSegment.PORT, port == -1 ? "": port.toString());
parseMap.put(UriSegment.PATH, dirtyUri.getRawPath());
parseMap.put(UriSegment.QUERY, dirtyUri.getRawQuery());
parseMap.put(UriSegment.FRAGMENT, dirtyUri.getRawFragment());
//Now we canonicalize each part and build our string.
StringBuilder sb = new StringBuilder();
//Replace all the items in the map with canonicalized versions.
Set<UriSegment> set = parseMap.keySet();
SecurityConfiguration sg = ESAPI.securityConfiguration();
// boolean restrictMixed = sg.getBooleanProp("AllowMixedEncoding");
// boolean restrictMultiple = sg.getBooleanProp("AllowMultipleEncoding");
boolean allowMixed = sg.getAllowMixedEncoding();
boolean allowMultiple = sg.getAllowMultipleEncoding();
for(UriSegment seg: set){
String value = encoder.canonicalize(parseMap.get(seg), allowMultiple, allowMixed);
value = value == null ? "" : value;
//In the case of a uri query, we need to break up and canonicalize the internal parts of the query.
if(seg == UriSegment.QUERY && null != parseMap.get(seg)){
StringBuilder qBuilder = new StringBuilder();
try {
Map<String, List<String>> canonicalizedMap = this.splitQuery(dirtyUri);
Set<Entry<String, List<String>>> query = canonicalizedMap.entrySet();
Iterator<Entry<String, List<String>>> i = query.iterator();
while(i.hasNext()){
Entry<String, List<String>> e = i.next();
String key = (String) e.getKey();
String qVal = "";
List<String> list = (List<String>) e.getValue();
if(!list.isEmpty()){
qVal = list.get(0);
}
qBuilder.append(key)
.append("=")
.append(qVal);
if(i.hasNext()){
qBuilder.append("&");
}
}
value = qBuilder.toString();
} catch (UnsupportedEncodingException e) {
logger.debug(Logger.EVENT_FAILURE, "decoding error when parsing [" + dirtyUri.toString() + "]");
}
}
//Check if the port is -1, if it is, omit it from the output.
if(seg == UriSegment.PORT){
if("-1" == parseMap.get(seg)){
value = "";
}
}
parseMap.put(seg, value );
}
return buildUrl(parseMap);
}
/**
* The meat of this method was taken from StackOverflow: http://stackoverflow.com/a/13592567/557153
* It has been modified to return a canonicalized key and value pairing.
*
* @param java URI
* @return a map of canonicalized query parameters.
* @throws UnsupportedEncodingException
*/
public Map<String, List<String>> splitQuery(URI uri) throws UnsupportedEncodingException {
final Map<String, List<String>> query_pairs = new LinkedHashMap<String, List<String>>();
final String[] pairs = uri.getQuery().split("&");
for (String pair : pairs) {
final int idx = pair.indexOf("=");
final String key = idx > 0 ? encoder.canonicalize(pair.substring(0, idx)) : pair;
if (!query_pairs.containsKey(key)) {
query_pairs.put(key, new LinkedList<String>());
}
final String value = idx > 0 && pair.length() > idx + 1 ? URLDecoder.decode(pair.substring(idx + 1), "UTF-8") : null;
query_pairs.get(key).add(encoder.canonicalize(value));
}
return query_pairs;
}
public enum UriSegment {
AUTHORITY, SCHEME, SCHEMSPECIFICPART, USERINFO, HOST, PORT, PATH, QUERY, FRAGMENT
}
参考代码存在here.
请注意,部分内容将重构为DefaultEncoder类,因为canonicalize方法正确属于那里。
编写的代码在ESAPI项目代码中经过严格测试,但我可能忘记了一两个方法。你可以按原样克隆和编译esapi项目,但有些组织不允许你使用最前沿的非发布二进制文件。