Question

我是Java新手（不到半年前开始上课）我不知道如何实现这一点。希望它可以用某种正则表达式覆盖 - 虽然我还没有在我的课程中介绍正则表达式，所以如果有人可以简单地解释他们的答案，那将不胜感激。

以下是目前的代码：

import java.io.*;
import java.util.*;
import java.net.*;
public class definerNotOrganised

{
    public static void main(String[]args) throws Exception
    {
        System.out.println("\f\n\tWelcome to the word definer! (Input '*' to exit)");
        while (true)
        {
            System.out.print("\n\tEnter a word to Define: ");
            input();
        }
    }

private static  void input() throws Exception 
{
    Scanner sc = new Scanner(System.in);    
    String userWord = sc.nextLine();
    if (userWord.equalsIgnoreCase("*"))
    {
        System.out.println("Exiting...");
        System.exit(0);
    }
    else
    {
        System.out.print(define(userWord));
    }

}

private static String define(String word) throws Exception
{                
    String notFound = "I'm sorry, I can't find that word...";
    String line = "";
    BufferedReader br = new BufferedReader(new InputStreamReader(new URL("https://raw.githubusercontent.com/sujithps/Dictionary/master/Oxford%20English%20Dictionary.txt").openStream()));     
    try {
        while (line != null)
        {                             
            line = br.readLine();      
            String lineFirstWord = firstWord(line);
            if ((lineFirstWord.equalsIgnoreCase(word))&&(line.length() > 5))
            {
                cleanUp(line);       
            }
        }
    } catch (Exception E) 
    {
        return notFound;
    }
    return notFound;
}     

private static String firstWord(String line) {
    if (line.indexOf(' ') > -1)
    { 
        return line.substring(0, line.indexOf(' ')); 
    } else 
    {
        return line; 
    }
}

private static void cleanUp(String line) 
{
//Unsure what to put in here
}

}

我正在编写的代码用于定义单词，它通过搜索https://raw.githubusercontent.com/sujithps/Dictionary/master/Oxford%20English%20Dictionary.txt来查找用户输入的单词的定义。它不是最优的，需要一段时间才能进行搜索 - 但这不是我现在想要解决的问题。

我确信有很多问题，但目前我想知道的是在cleanUp方法中放置什么来使输出更好。

我的代码的主要问题是如果单词有多个定义，输出可能会非常混乱。

例如，单词“nice”的输出将是：

好的adj。 1愉快，满意。 2（一个人）善良，善良。 3铁。糟糕或尴尬（好乱）。 4精细或微妙（很好的区别）。 5挑剔;非常敏感。 6（foll。通过adj。，经常使用和）在所描述的质量方面令人满意（好长时间;美好而温暖）。很高兴。好的Nicish adj。（也很好看）。 [原来是愚蠢的，来自拉丁语nescius无知]

这会被控制台一行打印出来，看起来很乱。我希望输出更像这样：

好的adj。



愉快，满意。



（一个人）善良，善良。



铁。糟糕或尴尬（好乱）。

等

最初我认为解决方案是让代码在字符串中找到一个数字，然后在其前面添加\n。

但是，有些定义本身包含数字，所以这不会有效。

每次有一个新定义时，它都会在句子结尾之后出现，所以理想情况下，代码需要查找. [a number]，然后在数字前面换行。

它还需要容纳最多两位数的数字，因为有些单词有很多定义。

随着进一步的安全防护（只是意味着条件在某个意外的地方得到满足），如果它仅在最后一个数字高一个时应用换行符将是有用的。（如果代码找到“.1”然后由于某种原因“。7”它不应该换行，但如果它找到“.2”它应该。）

很抱歉，如果之前发布了类似的内容，但我甚至不知道从哪里开始。我认识的人比我试图提供正则表达式解决方案更有能力，但它没有成功，希望这里有人可以提供帮助。

并非所有以前的标准都需要真正满足，它不一定非常完美，我只是想知道我的目标。对不起，请仔细阅读并提前致谢。

Answer 1

由于词典格式，你将比你想象的更难。印刷（与在线相对）词典使用许多格式化技术来缩短文本的长度，从而缩短书本身的长度。

根据您需要查找一个句点后跟一个数字（. #）的操作是不够的。在您的示例中查看定义6将获得的内容：

（foll。通过adj。，经常使用和）在所描述的质量方面令人满意（好长时间;美好而温暖）。很高兴。好的Nicish adj。（也很好看）。 [原来是愚蠢的，来自拉丁语nescius ignorant]

但这是不正确的，因为字典格式是按顺序写入不同的parts of speech。您可能喜欢的是

好的adj。

...


（foll。通过adj。，经常用和）在描述的质量方面令人满意（很长很长时间;很好和温暖）。
        很高兴。

Niceness n。

Nicish adj。（也很好看）。 [原来是愚蠢的，来自拉丁语nescius无知]

这是排除任何其他格式约定。您将不得不查阅字典中解释所有缩写和定义格式的第一页。

目前，我建议你写一个关键字列表，例如adj，adv，n等，然后再搜索来搜索为. #。这是一个不完整的尝试：

public static void main(String[] args) {

    final String[] KEYWORDS = {" adj\\. ", " n\\. ", " adv\\. "};

    String s = "Nice adj. 1 pleasant, satisfactory. 2 (of a person) kind, good-natured. 3 iron. Bad or awkward (nice mess). 4 fine or subtle (nice distinction). 5 fastidious; delicately sensitive. 6 (foll. By an adj., often with and) satisfactory in terms of the quality described (a nice long time; nice and warm). nicely adv. Niceness n. Nicish adj. (also niceish). [originally = foolish, from latin nescius ignorant]";
    String r = s;

    for (String kw : KEYWORDS)
        r = r.replaceAll(kw + "(?![^(]+\\))", kw + "\n");
    r = r.replaceAll("\\.\\s+(\\d+)", ".\n $1.");
    System.out.println(r);
}

输出

好的adj。



愉快，满意。



（一个人）善良，善良。



铁。糟糕或尴尬（好乱）。



精细或微妙（很好的区别）。



挑剔;非常敏感。



（foll。通过adj。，经常用和）在描述的质量方面令人满意（好长时间;好又温暖）。很高兴。


Niceness n。

Nicish adj。

（也很好看）。 [原来是愚蠢的，来自拉丁语nescius无知]

请注意，需要一个任意长度的lookbehing来修复定义6中的nicely adv.。另外，在Nicish adj.形式中，附加信息不应该用换行符分隔。

Answer 2

我自己是Java新手，但我想我可以尝试一下。我添加了＃34;假的＆＃34;枚举中的数字，以确保它正常工作。我鼓励更有经验的Java程序员对这篇文章发表评论，以进一步改进。

import java.util.ArrayList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Main {

    public static void main(String[] args) {
        String str = "Nice adj. 1 pleasant, satisfactory. 2 (of a person) kind,"
                + " good-natured. 3 iron. 2 A fake number. Bad or awkward (nice mess). 4 fine or"
                + " subtle (nice distinction). 5 fastidious; delicately "
                + "sensitive. 8 Another fake number. 6 (foll. By an adj., often with and) satisfactory"
                + " in terms of the quality described (a nice long time; nice"
                + " and warm). nicely adv. Niceness n. Nicish adj. (also "
                + "niceish). [originally = foolish, from latin nescius ignorant]";

        String strClean = cleanUp(str);

        System.out.println(strClean);
    }

    private static String cleanUp(String str) {
        StringBuilder cleaned = new StringBuilder();
        int currentLevel = 0;

        /* The initial pre-digit information */
        Matcher initialMatcher = Pattern.compile("(.*?)(?=\\. 1)").matcher(str);
        // We must initialise the matcher before grouping
        boolean initialMatchBool = initialMatcher.find();
        cleaned.append(initialMatcher.group(1) + ".");

        /* Digit listing */
        List<String> startDigitList = new ArrayList<String>();
        Matcher startDigitMatcher = Pattern.compile("(?<=\\. )(\\d[^\\d]*)").matcher(str);

        while (startDigitMatcher.find()) {
            startDigitList.add(startDigitMatcher.group());
        }

        for (String match: startDigitList) {
            /* The first digit of a match */
            Matcher digitMatcher = Pattern.compile("(^\\d+)").matcher(match);
            // We must initialise the matcher before grouping
            boolean digitMatchBool = digitMatcher.find();
            int precedingDigit = Integer.parseInt(digitMatcher.group(1));

            if (precedingDigit == currentLevel+1) {
                cleaned.append("\n\t");
                currentLevel++;
            }
            cleaned.append(match);
        }

        return cleaned.toString();
    }
}

输出：

Nice adj.
    1 pleasant, satisfactory. 
    2 (of a person) kind, good-natured. 
    3 iron. 2 A fake number. Bad or awkward (nice mess). 
    4 fine or subtle (nice distinction). 
    5 fastidious; delicately sensitive. 8 Another fake number. 
    6 (foll. By an adj., often with and) satisfactory in terms of the quality described (a nice long time; nice and warm). nicely adv. Niceness n. Nicish adj. (also niceish). [originally = foolish, from latin nescius ignorant]

Answer 3

我相信你对文本中可能存在“假”数字的担忧是没有根据的，而且你无法真实地防范恰好与下一个预期序号匹配的“假”数字。

因此，这就足够了：

String formatted = definition.replaceAll("\\. (\\d+)", ".\n\t$1");

JAVA在某些字符组合后为字符串添加换行符？

3 个答案: