这是Pig实施中的分布式缓存:
public class Regex extends EvalFunc<String> {
static HashMap<String, String> map = new HashMap<String, String>();
public List<String> getCacheFiles() {
Path lookup_file = new Path(
"hdfs://localhost.localdomain:8020/user/cloudera/top");
List<String> list = new ArrayList<String>(1);
list.add(lookup_file + "#id_lookup");
return list;
}
public void VectorizeData() throws IOException {
FileReader fr = new FileReader("./id_lookup");
BufferedReader brd = new BufferedReader(fr);
String line;
while ((line = brd.readLine()) != null) {
String str[] = line.split("#");
map.put(str[0], str[1]);
}
fr.close();
}
private String Regex(Tuple input) throws ExecException {
// TODO Auto-generated method stub
String tweet = (String) input.get(0);
for (Entry<String, String> entry : map.entrySet()) {
Pattern r = Pattern.compile(map.get(entry.getKey()));
Matcher m = r.matcher(tweet);
System.out.println(m.find());
System.out.println(m.pattern());
if (m.find() == true) {
return entry.getValue();
}
}
return null;
}
@Override
public String exec(Tuple input) throws IOException {
VectorizeData();
return Regex(input);
}
}
以下是运行此UDF后的错误。 这主要与哈希映射有关
java.lang.ClassCastException: java.util.HashMap cannot be cast to java.lang.String
at UDF.Regex.Regex(Regex.java:47)
at UDF.Regex.exec(Regex.java:70)
at UDF.Regex.exec(Regex.java:1)
hashmap返回的大小为3表示已填充。 请帮助解决类强制转换异常