将Lucene CharArraySet转换为字符串集的最优雅方法是什么?

时间:2013-02-11 23:52:44

标签: java lucene

我正在尝试从Lucene 4.0中检索法语停用词列表。 唯一可用的方法是FrenchAnalyzer.getDefaultStopSet(),它返回一个CharArraySet。我需要将其转换为字符串集。

我快速而肮脏的工作代码如下:

Set<String> stopWords = new HashSet<String>();
for (String stopWord : FrenchAnalyzer.getDefaultStopSet().toString().split(", ")) {
    stopWords.add(stopWord);
};

并返回:

[eues, serais, fûtes, serait, eussions, est, étant, pour, avez, on, avions, ceci, serez, avec, moi, ou, eue, mon, son, eussiez, aurez, notre, nos, avais, avait, soi, une, seraient, eûmes, aurais, aurait, ait, fûmes, du, eusse, étées, serions, des, aurions, [lui, fût, seront, sois, seriez, serons, soit, eût, aie, avons, ces, cet, de, eut, eus, ma, me, eusses, furent, eux, fus, fut, eu, leurs, d, ayez, les, aviez, c, n, auront, l, aurons, m, j, un, fussiez, elle, nous, t, eûtes, tu, s, soyez, ne, sans, en, et, es, y, étée, même, seras, cette, auraient, sommes, te, aux, quels, soyons, êtes, étais, quelles, était, étés, celà , leur, aies, ta, serai, fusse, fussions, auras, fussent, votre, se, auriez, aurai, le, étiez, sa, ce, tes, été, ses, toi, vous, la], sera, aient, par, étions, ici, pas, sur, avaient, ayant, ont, mes, quelle, étaient, ton, que, qu, eurent, vos, qui, fusses, mais, as, dans, il, à, au, je, ai, sont, quel, aura, soient, suis, ayons, ils, eussent]

我尝试使用迭代器:

Iterator iter = FrenchAnalyzer.getDefaultStopSet().iterator();
while(iter.hasNext()) {
    Object stopWord = iter.next();
    stopWords.add(stopWord.toString());
}

但它返回一个加密集:

[[C@2fb3f8f6, [C@464c4975, [C@6fc5f743, [C@5705b99f, [C@26ee7a14, [C@5a9e29fb, [C@b41b571, [C@47315d34, [C@3e110003, [C@210a6ae2, [C@82a6f16, [C@70cb6009, [C@575fadcf, [C@1342a80d, [C@7f09fd93, [C@58ecb281, [C@1a84da23, [C@165973ea, [C@4ac9131c, [C@6fb000e7, [C@34fbb7cb, [C@603b1d04, [C@630045eb, [C@159b5217, [C@5975d6ab, [C@ac980c9, [C@a94884d, [C@5557c2bd, [C@16ba8602, [C@1b016632, [C@36b8bef7, [C@744a6cbf, [C@1c93d6bc, [C@509df6f1, [C@7d26f75b, [C@80d3d6f, [C@2b76e552, [C@7825d2b2, [C@3c1d332b, [C@38dda25b, [C@6cb8, [C@7f2ad19e, [C@5328f6ee, [C@51b48197, [C@4e17e4ca, [C@2acdb06e, [C@32ef2c60, [C@f01a1e, [C@5e7808b9, [C@2a9df354, [C@2b275d39, [C@b6e39f, [C@46b8c8e6, [C@36ff057f, [C@7290cb03, [C@16bdb503, [C@288051, [C@502cb49d, [C@5a5e179a, [C@50c4fe76, [C@4229ab3e, [C@266bade9, [C@45d64c37, [C@5ef4f44a, [C@29c56c60, [C@6719dc16, [C@35549f94, [C@44b01d43, [C@5ece2187, [C@3f77b3cd, [C@6766afb3, [C@596e1fb1, [C@76f8968f, [C@14d6a05e, [C@6ef137d, [C@2087c268, [C@67d225a7, [C@7b2be1bd, [C@79df8b99, [C@2dec8909, [C@3b835282, [C@4bbd7848, [C@423e5d1, [C@76497934, [C@48ee22f7, [C@4ed1e89e, [C@19e3118a, [C@851052d, [C@54281d4b, [C@1bbb60c3, [C@1f4384c2, [C@21a80a69, [C@2d5253d5, [C@4ce32802, [C@939b78e, [C@79a5f739, [C@1d807ca8, [C@2393385d, [C@79de256f, [C@c0b76fa, [C@2abe0e27, [C@604e280c, [C@2fcac6db, [C@e4ac00c, [C@23d256fa, [C@38b5dac4, [C@2b2d96f2, [C@3a6ac461, [C@6910fe28, [C@488e32e7, [C@2adb1d4, [C@676bd8ea, [C@32bf7190, [C@2c41d05d, [C@509f5011, [C@a39ab89, [C@39e87719, [C@332611a7, [C@1572e449, [C@418c56d, [C@78dc6a77, [C@25fa1bb6, [C@20c1f10e, [C@7ad81784, [C@6513cf0, [C@29e97f9f, [C@9c0ec97, [C@2b5ac3c9, [C@2d342ba4, [C@4a8c1dd9, [C@1d3c468a, [C@3782da3d, [C@25595f51, [C@4d865b28, [C@4c5e176f, [C@15a62c31, [C@1cb8deef, [C@d8d9850, [C@380e28b9, [C@7df17e77, [C@3da99561, [C@2df6df4c, [C@2efb56b1, [C@68e6ff0d, [C@33010058, [C@69945ce, [C@53ebd75b, [C@3d9360e2, [C@351e1e67, [C@2705d88a, [C@2993a66f, [C@e80d1ff, [C@52c05d3b, [C@3a64c34e, [C@6bdab91, [C@4ce2cb55, [C@77fddc31, [C@1be1a408, [C@20b9b538, [C@43462851, [C@30ec4a87, [C@4b0ab323, [C@74b23210]]

班级的Javadoc:FrenchAnalyzer

2 个答案:

答案 0 :(得分:3)

试试这个:

Iterator iter = FrenchAnalyzer.getDefaultStopSet().iterator();
while(iter.hasNext()) {
    char[] stopWord = (char[]) iter.next();
    stopWords.add(new String (stopWord));
}

答案 1 :(得分:2)

看起来该方法返回Set<char[]>,所以你要做的第一件事就是键入你的迭代器,然后从char数组中构建一个String。使用foreach循环也简化了代码:

一个简单的实现是:

for (char[] chars : (Set<char[]>)FrenchAnalyzer.getDefaultStopSet()) {
    stopWords.add(new String(chars, "UTF-8"));
}