如何修复文本文件以符合标点符号?

时间:2015-12-28 03:16:11

标签: java text stream

我目前正在开发一个独立的项目,但是我无法将文本文件转换为正确的格式。目前,我的程序读取一个新行 - 它假定一行=一个句子 - 但这是有问题的,因为有人可以插入一个段落,其中标点符号遍布整个地方。我想要做的是让每个句子成为它的单独行,然后从该文件中读取。我不想变空,所以我尝试了它的唯一方法,我让它使用短长度的字符串,但是一旦我进入更长的文本文件,我不得不使用Streams,我遇到了问题: (文件名太长)

<小时/> 示例:

输入:这是一个假句。您好,这也是一个。而这一个也是。

输出:

这是一个假句。

你好,这也是一个。

也是这个。

<小时/> 这很有效

public static void main(String args[])
            {
            String text = "Joanne had one requirement: Her child must be" +
                         " adopted by college graduates. So the doctor arranged" +
                            "for the baby to be placed with a lawyer and his wife." + 
                            " Paul and Clara named their new baby Steven Paul Jobs.";    
            Pattern pattern = Pattern.compile("\\?|\\.|\\!|\\¡|\\¿");
            Matcher matcher = pattern.matcher(text);
            StringBuilder text_fixed = new StringBuilder(); 
            String withline = ""; 
            int starter = 0; 
            String overall = "";
            String blankspace = " ";

            while (matcher.find()) 
            {
                int holder = matcher.start(); 
                System.out.println("=========> " + holder);

                /***/

                withline = text.substring(starter, holder + 1); 
                withline = withline + "\r\n";
                overall = overall + withline; 
                System.out.println(withline);
                starter = holder + 2;


            }
                System.out.println(overall);
                //return overall;
            }

<小时/> 这会产生问题:

                public static void main(String[] args) throws IOException
                {
                    final String INPUT_FILE = "practice.txt";
                    InputStream in = new FileInputStream(INPUT_FILE);
                    String fixread = getStringFromInputStream(in);
                   String fixedspace =  fixme(fixread);
                    File ins = new File(fixedspace);
                    BufferedReader reader = new BufferedReader(new FileReader(ins));
                    Pattern p = Pattern.compile("\n");
                    String line, sentence;
                    String[] t;
                    while ((line = reader.readLine()) != null )
                    {
                        t = p.split(line);  /**hold curr sentence and remove it from OG txt file since you will reread.*/
                        sentence = t[0]; 
                        indiv_sentences.add(sentence);   
                    }
                    //putSentencestoTrie(indiv_sentences);
                    //runAutocompletealt();
                }



            private static String fixme(String fixread) 
            {
                Pattern pattern = Pattern.compile("\\?|\\.|\\!|\\¡|\\¿");
                String actString = fixread.toString();
                Matcher matcher = pattern.matcher(actString);
                String withline = ""; 
                int starter = 0; 
                String overall = "";
                while (matcher.find()) 
                {
                    int holder = matcher.start(); 
                    withline = actString.substring(starter, holder + 1); 
                    withline = withline + "\r\n";
                    overall = overall + withline; 
                    starter = holder + 2;
                }

                    return overall;
                }

            /**this is not my code, this was provided by an outside source, I do not take credit*/
            /**http://www.mkyong.com/java/how-to-convert-inputstream-to-string-in-java/*/
            private static String getStringFromInputStream(InputStream is) {

                BufferedReader br = null;
                StringBuilder sb = new StringBuilder();

                String line;
                try {

                    br = new BufferedReader(new InputStreamReader(is));
                    while ((line = br.readLine()) != null) {
                        sb.append(line);
                    }

                } catch (IOException e) {
                    e.printStackTrace();
                } finally {
                    if (br != null) {
                        try {
                            br.close();
                        } catch (IOException e) {
                            e.printStackTrace();
                        }
                    }
                }

                return sb.toString();

            }

<小时/> <小时/>

https://github.com/ChristianCSE/Phrase-Finder

我很确定这是我在本节中使用的所有代码,但如果您需要查看我的其余代码,我提供了一个指向我的存储库的链接。谢谢! enter image description here

1 个答案:

答案 0 :(得分:1)

问题是您正在创建名称应该是其内容的文件 - 对于文件名来说太长了。

 String fixedspace =  fixme(fixread);
 File ins = new File(fixedspace);//this is the issue, you gave the content as its name 

尝试提供样本名称并将输出写入文件。下面是一个示例。

String fixedspace =  fixme(fixread);
File out= new File("output.txt");
FileWriter  fr = new FileWriter(out);
fr.write(fixedspace);

然后阅读并继续。