如何从文本文件中找到包含任何字符的最长单词?

时间:2015-02-26 21:13:40

标签: java unicode

我的任务是简单地从文本文档中检索最长的单词。 如何调整此值以适用于任何语言,例如俄语或阿拉伯语。 包含数字0-9的单词将被忽略,单词中的任何标点符号在存储之前都会被删除

离。 53-лÐμÑ,нийÐ>Ðμнин?

ex,آعَاÙ...ÙŽÙ'ة؎عََٔىآÙ...ÙŽÙ

我的代码:

public Collection<String> getLongestWords() {

    String longestWord = "";
    String current;
    Scanner scan = new Scanner(new File("file.txt"));


    while (scan.hasNext()) {
        current = scan.next();
        if (current.length() > longestWord.length()) {
            longestWord = current;

        }
        return longestWord;

    }

}

注意:我之前从未实现过unicode:/

1 个答案:

答案 0 :(得分:1)

你完全正常工作我相信:(找到并返回文本文件中最长的单词)

import java.util.Scanner;
import java.io.File;
import java.io.FileNotFoundException;

public class hello {
     public static void main(String [ ] args) throws FileNotFoundException {
    new hello().getLongestWords();
 }

public String getLongestWords() throws FileNotFoundException {

    String longestWord = "";
    String current;
    Scanner scan = new Scanner(new File("file.txt"));


    while (scan.hasNext()) {
        current = scan.next();
        if (current.length() > longestWord.length()) {
            longestWord = current;
        }

    }
    System.out.println(longestWord);
            return longestWord;
        }

}

剥离标点符号:

    longestWord.replaceAll("[^a-zA-Z ]", "").split("\\s+");

返回之前!

如果您不想考虑带数字的单词:

if ((current.length() > longestWord.length()) && (!current.matches(".*\\d.*"))) {

一切都在一起:

import java.util.Scanner;
import java.io.*;

public class hello {
     public static void main(String [ ] args) throws FileNotFoundException {
    new hello().getLongestWords();
 }

public String getLongestWords() throws FileNotFoundException {

    String longestWord = "";
    String current;
    Scanner scan = new Scanner(new File("file.txt"));


    while (scan.hasNext()) {
        current = scan.next();
        if ((current.length() > longestWord.length()) && (!current.matches(".*\\d.*"))) {
            longestWord = current;
        }

    }
    System.out.println(longestWord);
    longestWord.replaceAll("[^a-zA-Z ]", "").split("\\s+");
            return longestWord;
        }

}