Question

关于Java RegEx的问题：

我有一个标记器，我只想返回长度高于一定长度的标记。

例如：我需要在本文中返回超过1个字符的所有标记： “这是一个文字。”

我需要获得3个令牌：“此”，“是”，“text” 不需要以下令牌：“a”和“。”。请注意，字符串可以包含任何字符（不仅是alpha-bet字符）

我尝试了这段代码，但我不知道如何完成它：

    String lines[]  = {"This is o n e l e tt e r $ % ! sentence"};


    for(String line : lines)
    {
        String orig = line;

        Pattern Whitespace = Pattern.compile("[\\s\\p{Zs}]+");
        line = Whitespace.matcher(orig).replaceAll(" ").trim();
        System.out.println("Test:\t'" + line + "'");

        Pattern SingleWord = Pattern.compile(".+{1}");  //HOW CAN I DO IT?
        SingleWord.matcher(line).replaceAll(" ").trim();
        System.out.println("Test:\t'" + line + "'");



    }

由于

Answer 1

为什么你不这样使用\w{2,}：

String line = "This is o n e l e tt e r $ % ! sentence";

Pattern pattern = Pattern.compile("\\w{2,}");
Matcher matcher = pattern.matcher(line);
while (matcher.find()) {
    System.out.println(matcher.group());
}

<强>输出

This
is
tt
sentence

修改

然后您可以使用此[A-Za-z0-9_@.-]{2,}您可以指定您不想避免的特殊字符，或者您可以使用[^\s]{2,}或\S{2,} a non-whitespace character：< / p>
输入

This is o email@gmail.com n e l e tt e r $ % ! sentence

<强>输出

This is email@gmail.com tt sentence

Answer 2

如果您使用Java 8，您可以这样做：

let array = [1.0, 1.1, 1.2, 1.3]
let formatter = NumberFormatter()
formatter.numberStyle = .decimal
formatter.maximumFractionDigits = 2
formatter.minimumFractionDigits = 2

let string = array.compactMap { formatter.string(for: $0) }
    .joined(separator: ", ")

SQL*Plus: Release 11.2.0.2.0 Production on Wed May 10 12:32:40 2017 Copyright (c) 1982, 2010, Oracle. All rights reserved. Enter user-name: system Enter password: ERROR: ORA-28000: the account is locked Enter user-name:现在包含：

String line = "This is o n e l e tt e r $ % ! sentence";
ArrayList<String> array = new ArrayList<>(Arrays.asList(line.split(" ")));
array.removeIf(u -> u.length() == 1);

Answer 3

我会使用像

这样简单的东西

List<String> words = new LinkedList<String>();
Matcher m = Pattern.compile("\\S{2,}").matcher(line);
while(m.find())
{
    words.add(m.group(0));
}

\\S（带有大写's'）匹配所有非空格字符。

免责声明：我没有运行此功能，但应该可以使用（可能只进行一些最小的更改）

用户RegEx（（un）匹配特定值以上的所有单词长度

3 个答案: