比较来自读入文件的句子 - Java

时间:2012-04-27 14:12:43

标签: java file-io string-comparison

我需要在一个包含2个句子的文件中进行比较并返回0到1之间的数字。如果句子完全相同,则应该返回1表示true,如果它们完全相反则应返回0为假。如果句子相似但是单词被改为同义词或者某些东西,那么它应该返回.25 .5或.75。文本文件的格式如下:

______________________________________
Text: Sample 

Text 1: It was a dark and stormy night. I was all alone sitting on a red chair. I was not completely alone as I had three cats.

Text 20: It was a murky and stormy night. I was all alone sitting on a crimson chair. I was not completely alone as I had three felines
// Should score high point but not 1

Text 21: It was a murky and tempestuous night. I was all alone sitting on a crimson cathedra. I was not completely alone as I had three felines
// Should score lower than text20

Text 22: I was all alone sitting on a crimson cathedra. I was not completely alone as I had three felines. It was a murky and tempestuous night.
// Should score lower than text21 but NOT 0

Text 24: It was a dark and stormy night. I was not alone. I was not sitting on a red chair. I had three cats.
// Should score a 0!
________________________________________________

我有一个文件阅读器,但我不确定存储每一行​​的最佳方法,以便我可以比较它们。现在我正在读取文件,然后在屏幕上打印出来。存储这些内容然后将它们进行比较以得到我想要的数字的最佳方法是什么?

import java.io.*;

public class implement 
{


    public static void main(String[] args)
    {
        try
        {
            FileInputStream fstream = new FileInputStream("textfile.txt");

            DataInputStream in = new  DataInputStream (fstream);
            BufferedReader br = new BufferedReader (new InputStreamReader(in));
            String strLine;

            while ((strLine = br.readLine()) != null)
            {
                System.out.println (strLine);
            }

            in.close();
        }

        catch (Exception e)
        {
            System.err.println("Error: " + e.getMessage());
        }

    }

}

1 个答案:

答案 0 :(得分:1)

将它们保存在数组列表中。

ArrayList list = new ArrayList();
//Read File
//While loop
list.add(strLine)

要检查句子中的每个变量,只需删除标点符号,然后按空格分隔,并搜索您要比较的句子中的每个单词。我建议忽略单词或2或3个字符。这取决于你的题外话

然后将行保存到数组中,然后根据需要进行比较。 要比较相似的单词,您需要一个数据库来有效地检查单词。阿卡哈希表。一旦你有了这个,你可以半快速搜索数据库中的单词。接下来,这个哈希表的作品将需要一个词库链接到每个单词的相似单词。然后对每个句子中的关键词采用相似的单词,并在您要比较的句子上搜索这些单词。显然,在搜索相似的单词之前,您需要比较两个实际的句子。最后,您将需要一个高级数据结构,您必须自己构建,而不是直接比较。