Java Comparator使用两个不同的标准

时间:2013-07-24 14:42:11

标签: java comparator

我有以下比较器:

public static class WordComparator implements Comparator<Word> {
    @Override
    public int compare(Word word1, Word word2) {
        //TODO find a better way to determine threshold
        int threshold = 10; //allowed difference in height
        int word1y = (int)Math.round(word1.bbox.y1 * 1.0 / threshold);
        int word2y = (int)Math.round(word2.bbox.y1 * 1.0 / threshold);
        if (word1y == word2y) {
            return word1.bbox.x1 - word2.bbox.x1;
        }
        else {
            return word1y - word2y;
        }
    }
}

在任何Collection<Word>上你可以使用这个比较器然后它应该首先根据y1(y坐标,这里是word1.bbox.y1)对单词进行排序,然后对x1进行排序(x坐标,这里{{ 1}})。当前的实现还使用一种机制来规范化彼此在10 y范围内的所有单词。

但是我得出的结论是我当前的代码不起作用。我现在的问题是:如何制作一个可以在两个不同字段上进行比较的比较器?我已经有了返回值等等 - 我只需要找到正确的方法来实现它。 / p>

我希望你能帮助我。

按请求输出示例:

word.bbox.x1

输入是同一个列表,但随后是任意顺序。使用的当前“编码”是:w = [word_50, [188, 1455, 280, 1482, 92, 27], false, Totaal] w = [word_58, [1324, 1547, 1370, 1573, 46, 26], false, EU] w = [word_59, [1465, 1546, 1568, 1577, 103, 31], false, 173,50] w = [word_56, [300, 1558, 329, 1583, 29, 25], false, te] w = [word_62, [381, 2082, 605, 2119, 224, 37], false, verkrijgbaar!] w = [word_61, [305, 2093, 369, 2114, 64, 21], false, ons] w = [word_65, [605, 2114, 650, 2166, 45, 52], false, ] w = [word_68, [184, 2258, 319, 2382, 135, 124], false, ] w = [word_72, [296, 2278, 349, 2319, 53, 41], false, J] w = [word_73, [411, 2302, 470, 2322, 59, 20], false, ‚n.] w = [word_74, [571, 2319, 602, 2320, 31, 1], false, ] w = [word_76, [434, 2330, 635, 2357, 201, 27], false, Kerstkaarten] w = [word_77, [338, 2367, 436, 2393, 98, 26], false, Bestel] w = [word_69, [184, 2382, 338, 2409, 154, 27], false, ] w = [word_80, [1805, 2392, 1979, 2413, 174, 21], false, 37.45.08.070] w = [word_82, [1745, 2430, 1881, 2458, 136, 28], false, Groningen] w = [word_84, [1666, 2470, 1741, 2492, 75, 22], false, B.T.W.] w = [word_86, [1795, 2469, 1981, 2492, 186, 23], false, 821.82.468.501] w = [word_88, [1741, 2547, 1873, 2575, 132, 28], false, Algemene] w = [word_108, [841, 2584, 1018, 2624, 177, 40], false, Betaling:] w = [word_111, [1295, 2582, 1336, 2613, 41, 31], false, 14] w = [word_102, [203, 2590, 261, 2630, 58, 40], false, Wij] w = [word_107, [640, 2585, 825, 2627, 185, 42], false, opdracht.] w = [word_90, [1666, 2593, 1695, 2609, 29, 16], false, en] w = [word_104, [431, 2597, 454, 2620, 23, 23], false, u] w = [word_106, [570, 2595, 628, 2619, 58, 24], false, uw] w = [word_92, [1666, 2625, 1709, 2654, 43, 29], false, zijn] w = [word_96, [1875, 2664, 1933, 2686, 58, 22], false, 1181] w = [word_116, [561, 2683, 751, 2715, 190, 32], false, factuurnr.] w = [word_119, [1108, 2678, 1321, 2710, 213, 32], false, vermelden.] w = [word_114, [265, 2685, 423, 2724, 158, 39], false, betaling] w = [word_117, [769, 2690, 815, 2713, 46, 23], false, en] w = [word_98, [1708, 2703, 1739, 2726, 31, 23], false, de] w = [word_101, [1863, 2703, 1999, 2730, 136, 27], false, Groningen] w = [word_125, [828, 2772, 1359, 2813, 531, 41], false, administratie@biuemule.nl] w = [word_123, [555, 2778, 646, 2809, 91, 31], false, deze] w = [word_121, [309, 2787, 441, 2819, 132, 32], false, vragen] w = [word_122, [455, 2787, 544, 2809, 89, 22], false, over] w = [word_124, [660, 2777, 814, 2808, 154, 31], false, factuur:] w = [word_120, [204, 2782, 298, 2812, 94, 30], false, Voor] w = [word_100, [1829, 2705, 1853, 2725, 24, 20], false, te] w = [word_99, [1750, 2704, 1816, 2725, 66, 21], false, K.v‚K.] w = [word_97, [1668, 2704, 1696, 2733, 28, 29], false, bij] w = [word_115, [435, 2692, 548, 2724, 113, 32], false, graag] w = [word_113, [200, 2687, 254, 2727, 54, 40], false, Bij] w = [word_118, [830, 2682, 1090, 2713, 260, 31], false, debiteurennr.] w = [word_95, [1754, 2670, 1863, 2687, 109, 17], false, nummer] w = [word_94, [1666, 2664, 1744, 2687, 78, 23], false, onder] w = [word_93, [1721, 2624, 1893, 2654, 172, 30], false, gedeponeerd] w = [word_105, [469, 2595, 559, 2620, 90, 25], false, voor] w = [word_91, [1709, 2585, 1998, 2614, 289, 29], false, betalingsvoorwaarden] w = [word_109, [1031, 2585, 1130, 2615, 99, 30], false, netto] w = [word_103, [274, 2589, 416, 2622, 142, 33], false, danken] w = [word_112, [1350, 2580, 1481, 2622, 131, 42], false, dagen.] w = [word_110, [1144, 2583, 1278, 2614, 134, 31], false, binnen] w = [word_89, [1883, 2547, 2006, 2575, 123, 28], false, leverings-] w = [word_87, [1666, 2549, 1733, 2570, 67, 21], false, Onze] w = [word_85, [1754, 2470, 1786, 2492, 32, 22], false, NL] w = [word_83, [1894, 2430, 2020, 2452, 126, 22], false, 02045251] w = [word_81, [1666, 2432, 1733, 2453, 67, 21], false, K.v.K.] w = [word_79, [1666, 2391, 1794, 2414, 128, 23], false, Rabobank] w = [word_78, [449, 2365, 528, 2398, 79, 33], false, tijdig] w = [word_70, [528, 2339, 685, 2409, 157, 70], false, ] w = [word_75, [225, 2332, 420, 2359, 195, 27], false, INTERCARD] w = [word_71, [224, 2323, 254, 2324, 30, 1], false, ] w = [word_67, [635, 2290, 685, 2339, 50, 49], false, ] w = [word_66, [349, 2258, 650, 2290, 301, 32], false, ] w = [word_63, [425, 2123, 434, 2138, 9, 15], false, \I] w = [word_64, [206, 2114, 650, 2258, 444, 144], false, ] w = [word_60, [248, 2085, 290, 2120, 42, 35], false, Bij] w = [word_57, [341, 1557, 458, 1583, 117, 26], false, betalen] w = [word_55, [188, 1558, 288, 1584, 100, 26], false, Totaal] w = [word_51, [294, 1455, 368, 1480, 74, 25], false, BTW] w = [word_54, [1536, 1448, 1571, 1473, 35, 25], false, 70]

因此,您应该只查看w = [word.id, [word.bbox.x1, word.bbox.y1, word.bbox.x2, word.bbox.y2, word.bbox.width, word.bbox.height], word.isStrong, word.content]word.bbox.y1值。正如你所看到的那样,它显然不是随机的,现在它被格式化为围绕y值的一种抛物线。

3 个答案:

答案 0 :(得分:0)

你一定要看看Apache's CompareToBuilder

然后你可以这样做:

public int compare(Word word1, Word word2) {
    int threshold = 10; //allowed difference in height
    int word1y = (int)Math.round(word1.bbox.y1 * 1.0 / threshold);
    int word2y = (int)Math.round(word2.bbox.y1 * 1.0 / threshold);
    return new CompareToBuilder()
       .append(word1y, word2y)
       .append(word1.bbox.x1, word2.bbox.x1)
       .toComparison();
}

答案 1 :(得分:0)

Comparator中使用减法时应该小心。如果差异word1.bbox.x1 - word2.bbox.x1word1y - word2y大于Integer.MAX_VALUE,那么结果将会溢出,您将得到错误的结果。

例如,在这种情况下,res-1

a > bint a = Integer.MAX_VALUE; int b = Integer.MIN_VALUE; int res = a - b; res < 0 == true;
{{1}}

更安全的选择是Integer.compare

答案 2 :(得分:0)

我会按如下方式编写比较代码:

int diff;

if((diff = Integer.compare(word1.bbox.y1 / 10, word2.bbox.y2 / 10)) != 0)
    return diff;
return Integer.compare(word1.bbox.x1, word2.bbox.x1);

如果你正在运行Java 1.7 - Integer#compare之前就不存在了。

如果您运行的是早期版本,则实施起来非常简单:

public int compare(int a, int b) {
    return a > b ? 1 : a == b ? 0 : -1;
}