我有一个包含url及其md5代码的数据库。我需要检查数据库中是否存在来自一组网址的链接。我有以下代码,它在多个线程中运行。每个胎面都是针对特定的md5键,这是url&m; md5的前三位数:
String[] urls = new String[linksMap.size()];
String[] md5s = new String[linksMap.size()];
boolean expectingMittcom = false;
boolean hadMittcom = false;
int i = 0;
for (String url : linksMap.keySet()) {
urls[i] = url;
if (url.equals("http://mittcom.com/")) {
expectingMittcom = true;
}
md5s[i++] = linksMap.get(url).variationMd5;
}
int offset = 0;
while (offset < urls.length) {
Array arMd5 = pqm.getConnection().createArrayOf("text",
Arrays.copyOfRange(md5s, offset, Math.min(offset + MAX_NUM,
urls.length)));
Array arUrl = pqm.getConnection().createArrayOf("text",
Arrays.copyOfRange(urls,
offset, Math.min(offset + MAX_NUM, urls.length)));
PreparedStatement ps = pqm.getConnection().prepareStatement(
"select url from links.links_" + key
+ " where md5=any(?) and url=any(?)");
ps.setArray(1, arMd5);
ps.setArray(2, arUrl);
ResultSet rs = ps.executeQuery();
while (rs.next()) {
String url = rs.getString(1);
boolean printDebug = false;
if (url.equals("http://mittcom.com/")) {
hadMittcom = true;
printDebug = true;
}
LinkVariation r = linksMap.remove(url);
if (printDebug) {
logger.info("Link variation: " + r);
}
if (r != null) {
Map<String, String[]> linksMapOriginal =
linksByMD5MapOriginal.get(r.original[INDEX_MD5].substring(0, 3));
if (printDebug) {
logger.info("will try to fliter out ["
+ r.original[INDEX_URL] + "]");
}
String[] remove = linksMapOriginal.remove(r.original[INDEX_URL]);
if (remove != null) {
if (printDebug) {
logger.info("Filtered mittcom");
filtered.incrementAndGet();
checkStillHere();
}
} else {
if (printDebug) {
logger.info("Did not filter mittcom");
}
}
}
}
rs.close();
ps.close();
offset += MAX_NUM;
}
if (expectingMittcom) {
if (hadMittcom) {
logger.info("was expecting mittcom and found");
} else {
logger.info("was expecting mittcom but didn't find");
}
}
问题是网址&#34; http://mittcom.com&#34; (以及其他一些,我只是为此特别调试)仍然保留在linksByMD5MapOriginal hashMap中。我可以在日志文件中看到它已被删除和过滤,但在线程完成运行后它仍然存在!我不明白它是怎么发生的!我怀疑使用不同的hashCode等问题,但是键是普通的String,应该没有这样的问题。我真的很困惑。
在所有胎面完成后,我都会这样检查:
for (Map.Entry<String, Map<String, String[]>> entrySet : linksByMD5MapOriginal.entrySet()) {
String key = entrySet.getKey();
Map<String, String[]> value = entrySet.getValue();
if (value.containsKey("http://mittcom.com/")) {
logger.info("STILL HERE in " + key);
}
}
hashMap按如下方式初始化:
protected Map<String, Map<String, String[]>> linksByMD5MapOriginal = new TreeMap<>();
...
linksByMD5MapOriginal.put(md5Key, linksByKeyMap = Collections.synchronizedMap(new TreeMap<String, String[]>()));
这里TreeMap是为了更容易调试,不需要订购。底层的hashMap是同步的,同时修改它应该没有问题。在线程运行时,没有任何东西会向hashMap添加任何内容。另一个奇怪的是我无法使用远程调试器(程序在远程服务器上运行),因为如果我尝试这样做,程序最终会挂起,所以我不得不使用日志打印输出进行调试。但这不是我要求的一般性问题。问题是过滤后的网址仍然挂在hashMap中!
很抱歉,如果我的问题似乎不清楚,我会更新我的帖子,如果有任何跟进问题。任何帮助将不胜感激。
UPD:日志打印出来:
[2017-10-04 07:25:57,580] [INFO ] [CheckUnique] [Thread-46229] Link variation: routines.queue.CheckUnique$LinkVariation@3f89fc46
[2017-10-04 07:25:57,580] [INFO ] [CheckUnique] [Thread-46229] will try to fliter out [http://www.mittcom.com/]
[2017-10-04 07:25:57,580] [INFO ] [CheckUnique] [Thread-46229] Filtered mittcom
[2017-10-04 07:25:57,580] [INFO ] [CheckUnique] [Thread-46229] Link variation: null
[2017-10-04 07:25:57,580] [INFO ] [CheckUnique] [Thread-46229] Link variation: null
[2017-10-04 07:25:57,580] [INFO ] [CheckUnique] [Thread-46229] Link variation: null
[2017-10-04 07:25:57,580] [INFO ] [CheckUnique] [Thread-46229] Link variation: null
[2017-10-04 07:25:57,580] [INFO ] [CheckUnique] [Thread-46229] Link variation: null
[2017-10-04 07:25:57,580] [INFO ] [CheckUnique] [Thread-46229] Link variation: null
[2017-10-04 07:25:57,580] [INFO ] [CheckUnique] [Thread-46229] Link variation: null
[2017-10-04 07:25:57,580] [INFO ] [CheckUnique] [Thread-46229] Link variation: null
[2017-10-04 07:25:57,580] [INFO ] [CheckUnique] [Thread-46229] Link variation: null
[2017-10-04 07:25:57,580] [INFO ] [CheckUnique] [Thread-46229] was expecting mittcom and found
...
[2017-10-04 07:46:35,337] [INFO ] [CheckUnique] [main] STILL HERE in cd2
答案 0 :(得分:0)
我修正了错误,它出现在一段与此无关的代码中。简而言之,链接变异检查机制被打破了。我可以删除这个问题吗?