我目前正在开发一个独立的项目,但是我无法将文本文件转换为正确的格式。目前,我的程序读取一个新行 - 它假定一行=一个句子 - 但这是有问题的,因为有人可以插入一个段落,其中标点符号遍布整个地方。我想要做的是让每个句子成为它的单独行,然后从该文件中读取。我不想变空,所以我尝试了它的唯一方法,我让它使用短长度的字符串,但是一旦我进入更长的文本文件,我不得不使用Streams,我遇到了问题: (文件名太长)
<小时/> 示例:输入:这是一个假句。您好,这也是一个。而这一个也是。
输出:
这是一个假句。
你好,这也是一个。
也是这个。
<小时/> 这很有效
public static void main(String args[])
{
String text = "Joanne had one requirement: Her child must be" +
" adopted by college graduates. So the doctor arranged" +
"for the baby to be placed with a lawyer and his wife." +
" Paul and Clara named their new baby Steven Paul Jobs.";
Pattern pattern = Pattern.compile("\\?|\\.|\\!|\\¡|\\¿");
Matcher matcher = pattern.matcher(text);
StringBuilder text_fixed = new StringBuilder();
String withline = "";
int starter = 0;
String overall = "";
String blankspace = " ";
while (matcher.find())
{
int holder = matcher.start();
System.out.println("=========> " + holder);
/***/
withline = text.substring(starter, holder + 1);
withline = withline + "\r\n";
overall = overall + withline;
System.out.println(withline);
starter = holder + 2;
}
System.out.println(overall);
//return overall;
}
<小时/> 这会产生问题:
public static void main(String[] args) throws IOException
{
final String INPUT_FILE = "practice.txt";
InputStream in = new FileInputStream(INPUT_FILE);
String fixread = getStringFromInputStream(in);
String fixedspace = fixme(fixread);
File ins = new File(fixedspace);
BufferedReader reader = new BufferedReader(new FileReader(ins));
Pattern p = Pattern.compile("\n");
String line, sentence;
String[] t;
while ((line = reader.readLine()) != null )
{
t = p.split(line); /**hold curr sentence and remove it from OG txt file since you will reread.*/
sentence = t[0];
indiv_sentences.add(sentence);
}
//putSentencestoTrie(indiv_sentences);
//runAutocompletealt();
}
private static String fixme(String fixread)
{
Pattern pattern = Pattern.compile("\\?|\\.|\\!|\\¡|\\¿");
String actString = fixread.toString();
Matcher matcher = pattern.matcher(actString);
String withline = "";
int starter = 0;
String overall = "";
while (matcher.find())
{
int holder = matcher.start();
withline = actString.substring(starter, holder + 1);
withline = withline + "\r\n";
overall = overall + withline;
starter = holder + 2;
}
return overall;
}
/**this is not my code, this was provided by an outside source, I do not take credit*/
/**http://www.mkyong.com/java/how-to-convert-inputstream-to-string-in-java/*/
private static String getStringFromInputStream(InputStream is) {
BufferedReader br = null;
StringBuilder sb = new StringBuilder();
String line;
try {
br = new BufferedReader(new InputStreamReader(is));
while ((line = br.readLine()) != null) {
sb.append(line);
}
} catch (IOException e) {
e.printStackTrace();
} finally {
if (br != null) {
try {
br.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
return sb.toString();
}
<小时/> <小时/>
https://github.com/ChristianCSE/Phrase-Finder
答案 0 :(得分:1)
问题是您正在创建名称应该是其内容的文件 - 对于文件名来说太长了。
String fixedspace = fixme(fixread);
File ins = new File(fixedspace);//this is the issue, you gave the content as its name
尝试提供样本名称并将输出写入文件。下面是一个示例。
String fixedspace = fixme(fixread);
File out= new File("output.txt");
FileWriter fr = new FileWriter(out);
fr.write(fixedspace);
然后阅读并继续。