Question

我有一个文本文件，其中包含有关各种软件组件的一些说明。现在提到了许多软件组件及其版本，例如说我的文件中有一个字符串

"Stack Careers 2.0 is the new number 1 site with symbol ! and * and blablabla   
 replacing older Stack Careers."

它也有一些符号和数字。

我已经用A-Za-z以外的任何字符分割了字符串，下面是它的代码。

getMySoftwareDescription().split("[^a-zA-Z]");

这给了我所有的单词（我实际上想要所有的单词，而不是除软件版本号之外的任何符号或数字），如

Stack,Careers,is,the，等等。在一个数组内。

但我希望将字符串Stack Careers 2.0作为单个字符串以及Stack Careers

（以及上面示例中的is，the之类的其他字词）。

我想提一下，我不擅长正则表达式。

Answer 1

你可以从这个

开始

    Pattern p = Pattern.compile("(\\p{Lu}[\\p{L} ]+)(\\d+[\\.]?\\d+)*|[\\S&&[^.]]+");
    Matcher m = p.matcher("Stack Careers 2.0 is the new number 1 site with symbol ! and \n* and blablabla\n replacing older Stack Careers.");
    List<String> list = new ArrayList<String>(); 
    while (m.find()) {
        list.add(m.group());
    }
    System.out.println(list);

输出

[Stack Careers 2.0, is, the, new, number, 1, site, with, symbol, !, and, *, and, blablabla, replacing, older, Stack Careers]

虽然应该改进以识别所有可能的选项。

使用正则表达式查找包含软件名称及其版本号的字符串

1 个答案: