我正在编写一个java程序来搜索包含字典中单词列表的文本文件中的单词。就像你现在一样,这个文件包含大约300,000个单词。我能够想出一个程序,可以迭代将每个单词与输入单词(我正在搜索的单词)进行比较的单词。问题是这个过程需要花费大量时间才能找到一个单词,特别是如果单词以x,y或z等最后一个字母开头。我想要更高效的东西,几乎可以立刻找到一个单词。 这是我的代码:
import java.io.IOException;
import java.io.InputStreamReader;
public class ReadFile
{
public static void main(String[] args) throws IOException
{
ReadFile rf = new ReadFile();
rf.searchWord(args[0]);
}
private void searchWord(String token) throws IOException
{
InputStreamReader reader = new InputStreamReader(
getClass().getResourceAsStream("sowpods.txt"));
String line = null;
// Read a single line from the file. null represents the EOF.
while((line = readLine(reader)) != null && !line.equals(token))
{
System.out.println(line);
}
if(line != null && line.equals(token))
{
System.out.println(token + " WAS FOUND.");
}
else if(line != null && !line.equals(token))
{
System.out.println(token + " WAS NOT FOUND.");
}
else
{
System.out.println(token + " WAS NOT FOUND.");
}
reader.close();
}
private String readLine(InputStreamReader reader) throws IOException
{
// Test whether the end of file has been reached. If so, return null.
int readChar = reader.read();
if(readChar == -1)
{
return null;
}
StringBuffer string = new StringBuffer("");
// Read until end of file or new line
while(readChar != -1 && readChar != '\n')
{
// Append the read character to the string. Some operating systems
// such as Microsoft Windows prepend newline character ('\n') with
// carriage return ('\r'). This is part of the newline character
// and therefore an exception that should not be appended to the
// string.
if(readChar != '\r')
{
string.append((char) readChar);
}
// Read the next character
readChar = reader.read();
}
return string.toString();
}
}
请注意,我想在Java ME环境中使用此程序。任何帮助都将受到高度赞赏谢谢 - Jevison7x。
答案 0 :(得分:1)
您可以使用fgrep
(fgrep
由-F
激活到grep
)(Linux man page of fgrep):
grep -F -f dictionary.txt inputfile.txt
字典文件应包含每行一个字。
不确定它是否仍然准确,但Wikipedia article on grep提到在fgrep
中使用Aho-Corasick algorithm,这是一种基于固定的字典构建自动机的算法< / strong>用于快速字符串匹配。
无论如何,您可以查看维基百科上的list of string searching algorithms on a finite set of patterns。在字典中搜索单词时,这些是更有效的方法。