Question

我有一个数据文件，其中每一行代表一条记录，每条记录可能包含一个关键字列表，每个关键字前面都有一个“+”。

foo1 foofoo foo foo foo +key1 +key2 +key3
foo2 foo foo foofoo foo 
foo3 foo foofoo foo +key1 key1 key1 +key2

在零和理论上无限数量的关键字之间。关键字总是会以+开头。单个关键字可以是单个单词，也可以是带空格的短语。我识别关键字的策略：

我想将这些记录读入数组String keywords[]。我正在使用lineBuffer来输入数据，这是我到目前为止所拥有的。

// PSEUDOCODE
counter = [number of occurences of + in the line];
for(int i=0;i<=counter,i++) {
    Pattern p = [regex reresenting + to the next occurence of + -or- end of line];
    Match pattern;
    keyword[i] = Match.group(1) }

我可能会过度思考这个问题，但Java是否知道在同一行中转到我的模式的下一个实例？看看这几行代码，似乎我的模式匹配器会读取该行，找到关键字的第一个实例并将其写入数组i次。它永远不会获得第二个关键字。

有没有更好的方法来考虑这个？创建此阵列的更好策略是什么？

Answer 1

如果您知道密钥中没有+，则可以简单地拆分字符串：

String[] ss = s.split(" \\+");

弃掉第一个条目（foo fooo ......）。

修改

关于模式/正则表达式问题，您也可以这样做：

Pattern p = Pattern.compile(" \\+\\w+"); Matcher m = p.matcher(s); while (m.find()) { String key = m.group().trim().replaceAll("\\+",""); System.out.println(key); }

Answer 2

使用Scanner：

可以很容易

Scanner s = new Scanner(line);
int i = 0;
while (s.hasNext()) {
    String token = s.next();
    if (token.startsWith("+")) {
        keyword[i] = token;
        i++;
    }
}

Java：我需要一种将数据解析为数组的策略

2 个答案: