阅读Apache Crunch示例,该示例主要是Java并且对两者都是新手。 (我知道.NET) 所以这是示例代码:
DoFn<String, Pair<String, Long>> extractIPResponseSize = new DoFn<String, Pair<String, Long>>() {
transient Pattern pattern;
public void initialize() {
pattern = Pattern.compile(logRegex);
}
public void process(String line, Emitter<Pair<String, Long>> emitter) {
Matcher matcher = pattern.matcher(line);
if(matcher.matches()) {
try {
Long responseSize = Long.parseLong(matcher.group(7));
String remoteAddr = matcher.group(1);
emitter.emit(Pair.of(remoteAddr, responseSize));
} catch (NumberFormatException e) {
// corrupt line, we should increment a counter
}
}
}
};
第一行非常困惑我,我无法遵循它,你能一块一块地解释一下吗?
注意:DoFn
是Apache Crunch中的一个类,这里是documentaiotn:
http://crunch.apache.org/apidocs/0.3.0/org/apache/crunch/DoFn.html
我也做了一些谷歌搜索,看起来Pair
也是Apache常见的Lang事情:
http://commons.apache.org/proper/commons-lang/javadocs/api-release/org/apache/commons/lang3/tuple/Pair.html
也许我需要了解Java泛型?