仅在一个测试函数中找到相同的文件路径,在其他测试函数中找不到这怎么可能?

时间:2018-04-04 12:17:31

标签: java junit

我在eclipse中运行这些junit测试:

class KGramIndexTestTest {

    @Test
    void testGetTerms() {
        KGramIndex kgIndex = new KGramIndex(2);

        File f = new File("../test/text.txt");
        try {
            Reader reader = new InputStreamReader( new FileInputStream(f), StandardCharsets.UTF_8 );
            Tokenizer tok = new Tokenizer( reader, true, false, true, "../patterns.txt" );
            while ( tok.hasMoreTokens() ) {
                String token = tok.nextToken();
                kgIndex.insert(token);
            }
        }catch (IOException e) {
            e.printStackTrace();
        }
        String wildcard = "hel*o";
        List<String> terms = kgIndex.getTerms(wildcard);
        assertEquals(terms.size(), 1);
        assertEquals(terms.get(0), "hello");
    }
    @Test
    void testGetKGrams() {
        KGramIndex kgIndex = new KGramIndex(2);

        File f = new File("../test/text.txt");
        try {
            Reader reader = new InputStreamReader( new FileInputStream(f), StandardCharsets.UTF_8 );
            Tokenizer tok = new Tokenizer( reader, true, false, true, "../patterns.txt" );
            while ( tok.hasMoreTokens() ) {
                String token = tok.nextToken();
                kgIndex.insert(token);
            }
        }catch (IOException e) {
            e.printStackTrace();
        }
        String wildcard = "hel*o";
        List<String> kGrams = kgIndex.getKGrams(wildcard);
        System.out.println("The k-grams");
        System.out.println(kGrams);
        assertEquals(4, kGrams.size());

        List<String> bigrams = new ArrayList<String>(); 
        bigrams.add("^h");
        bigrams.add("he");
        bigrams.add("el");
        bigrams.add("o$");
        assertTrue(kGrams.containsAll(bigrams));
    }


}

第二个测试函数成功,但第一个失败并生成FileNotFoundException

enter image description here

这是堆栈跟踪:

java.io.FileNotFoundException: ../test/text.txt (No such file or directory)
    at java.base/java.io.FileInputStream.open0(Native Method)
    at java.base/java.io.FileInputStream.open(FileInputStream.java:196)
    at java.base/java.io.FileInputStream.<init>(FileInputStream.java:139)
    at ir.KGramIndexTestTest.testGetTerms(KGramIndexTestTest.java:25)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:564)
    at org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:389)
    at org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:115)
    at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$6(TestMethodTestDescriptor.java:167)
    at org.junit.jupiter.engine.execution.ThrowableCollector.execute(ThrowableCollector.java:40)
    at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:163)

这令我感到困惑,因为两个函数都使用了相同的文件路径。如何才能在其中一个功能中找到它,并在另一个功能中正常工作?

编辑:我试图关闭reader,但它没有帮助。

编辑2:我尝试了第三种变体,但我仍然得到FileNotFoundException。有问题的行是testGetTerms中的这一行:

reader = new InputStreamReader( new FileInputStream(f), StandardCharsets.UTF_8 );

第三种变体:

class KGramIndexTestTest {

    @Test
    void testGetTerms() {
        KGramIndex kgIndex = new KGramIndex(2);

        File f = new File("../test/text.txt");
        Reader reader = null;
        try {
            reader = new InputStreamReader( new FileInputStream(f), StandardCharsets.UTF_8 );
            Tokenizer tok = new Tokenizer( reader, true, false, true, "../patterns.txt" );
            while ( tok.hasMoreTokens() ) {
                String token = tok.nextToken();
                kgIndex.insert(token);
            }
        }catch (IOException e) {
            e.printStackTrace();
        }finally {
            try {
                    if (reader != null) {
                        reader.close();
                    }
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
        String wildcard = "hel*o";
        List<String> terms = kgIndex.getTerms(wildcard);
        assertEquals(terms.size(), 1);
        assertEquals(terms.get(0), "hello");
    }
    @Test
    void testGetKGrams() {
        KGramIndex kgIndex = new KGramIndex(2);

        File f = new File("../test/text.txt");
        Reader reader = null;
        try {
            reader = new InputStreamReader( new FileInputStream(f), StandardCharsets.UTF_8 );
            Tokenizer tok = new Tokenizer( reader, true, false, true, "../patterns.txt" );
            while ( tok.hasMoreTokens() ) {
                String token = tok.nextToken();
                kgIndex.insert(token);
            }
        }catch (IOException e) {
            e.printStackTrace();
        }finally {
            try {
                if(reader != null) {
                    reader.close();
                }
            }catch(IOException e) {
                e.printStackTrace();
            }
        }
        String wildcard = "hel*o";
        List<String> kGrams = kgIndex.getKGrams(wildcard);
        System.out.println("The k-grams");
        System.out.println(kGrams);
        assertEquals(4, kGrams.size());

        List<String> bigrams = new ArrayList<String>(); 
        bigrams.add("^h");
        bigrams.add("he");
        bigrams.add("el");
        bigrams.add("o$");
        assertTrue(kGrams.containsAll(bigrams));
    }    
}

编辑3: 我尝试使用@BeforeAll@AfterAll,但我得到了完全相同的结果。

编辑4: 我还尝试将所有断言放在工作测试用例testKGrams中。发生的事情是先前正在运行的InputStreamReader初始化已停止工作并抛出FileNotFoundException

更新2:

KGramIndex.java:

public class KGramIndex {

    HashMap<Integer,String> id2term = new HashMap<Integer,String>();

    HashMap<String,Integer> term2id = new HashMap<String,Integer>();

    HashMap<String,List<KGramPostingsEntry>> index = new HashMap<String,List<KGramPostingsEntry>>();

    int lastTermID = -1;

    int K = 3;

    public KGramIndex(int k) {
        K = k;
        if (k <= 0) {
            System.err.println("The K-gram index can't be constructed for a negative K value");
            System.exit(1);
        }
    }

    private int generateTermID() {
        return ++lastTermID;
    }

    public int getK() {
        return K;
    }

    public int size() {
            return index.size();
    }

    public ArrayList<String> getTokenKGrams(String token){
            ArrayList<String> kGrams = new ArrayList<String>();
            int start = 0;
        int end = start+K;
        int noOfKGrams = token.length() - end + 1; 
        String kGram;
        int startCurr, endCurr;
        for (int i=0; i<noOfKGrams; i++) {
            startCurr = start + i;
            endCurr = end + i;
            kGram = token.substring(startCurr, endCurr);
            kGrams.add(kGram);  
        }
        return kGrams;
    }

    public List<String> getKGrams(String token) {

            int index = token.indexOf("*");
            if(index != -1) {
                String left = "^"+token.substring(0, index);
                String right = token.substring(index+1, token.length())+"$";
                List<String> leftKGrams = getTokenKGrams(left);
                List<String> rightKGrams = getTokenKGrams(right);
                leftKGrams.addAll(rightKGrams);
                return leftKGrams;
            }
            return getTokenKGrams(token);
    }

    public List<String> getTerms(String wildcard){
            int index = wildcard.indexOf("*");
            if(index == -1) {
                throw new IllegalArgumentException("wildcard must contain a '*'.");
            }

            List<String> kGrams = getKGrams(wildcard);
            List<String> kGramContainers = new ArrayList<String>();

            List<KGramPostingsEntry> intersection = getPostings(kGrams.get(0));
            List<KGramPostingsEntry> newPostings;
            for(int i = 1; i<kGrams.size(); i++) {
                newPostings = getPostings(kGrams.get(i));
                intersection = intersect(intersection, newPostings);
            }
            String term;
            for(KGramPostingsEntry entry : intersection) {
                term = id2term.get(entry.tokenID);
                kGramContainers.add(term);
            }

        StringBuilder regexBuilder = new StringBuilder(wildcard);
        regexBuilder.insert(index, ".");
        String regex = regexBuilder.toString();
        List<String> result = new ArrayList<String>();
            for (int i=0; i<kGramContainers.size(); i++) {
                term = kGramContainers.get(i);
                boolean matches = Pattern.matches(regex, term);
                if(matches) {
                    result.add(term);
                }
            }
            return result;
    }

    private List<KGramPostingsEntry> intersect(List<KGramPostingsEntry> pA, List<KGramPostingsEntry> pB) {
        ListIterator<KGramPostingsEntry> iterA = pA.listIterator();
        ListIterator<KGramPostingsEntry> iterB = pB.listIterator();

        List<KGramPostingsEntry> result = new ArrayList<KGramPostingsEntry>();
        KGramPostingsEntry entryA = iterA.next();
        KGramPostingsEntry entryB = iterB.next();

        while(true) {

                if(entryA.tokenID == entryB.tokenID) {
                    result.add(entryA);
                    if(iterA.hasNext() && iterB.hasNext()) {
                        entryA = iterA.next();
                        entryB = iterB.next();
                    }else {
                        break;
                    }
                }else if(entryA.tokenID > entryB.tokenID) { 
                    if(iterB.hasNext()) 
                        entryB = iterB.next();
                    else 
                        break;
                }
                else {
                    if(iterA.hasNext()) 
                        entryA = iterA.next();
                    else 
                        break;
                }
        }
        return result;
    }


    public void insert( String token ) {

            if (term2id.get(token) != null) {
                return;
            }

        id2term.put(++lastTermID, token);
        term2id.put(token, lastTermID);

            // is word long enough? for example, "a" can be bigrammed and trigrammed but not four-grammed.
            // K must be <= token.length + 2. "ab". K must be <= 4
            List<KGramPostingsEntry> postings = null;
            if(K > token.length() + 2) {
                return;
            }else if(K == token.length() + 2) {
                // insert the one K-gram "^<String token>$" into index
                postings = index.get("^"+token+"$");
                if (postings == null) {
                    KGramPostingsEntry newEntry = new KGramPostingsEntry(lastTermID);
                    ArrayList<KGramPostingsEntry> newList = new ArrayList<KGramPostingsEntry>();
                    newList.add(newEntry);
                    index.put("^"+token+"$", newList);
                }
                // No need to do anything if the posting already exists, so no else clause. There is only one possible term in this case
                // Return since we are done
                return;
            }else {
                // We get here if there is more than one k-gram in our term
                // insert all k-grams in token into index
                int start = 0;
                int end = start+K;
                //add ^ and $ to token.
                token = "^"+token+"$";
                int noOfKGrams = token.length() - end + 1; 
                // get K-Grams
                String kGram;
                int startCurr, endCurr;
                for (int i=0; i<noOfKGrams; i++) {

                    startCurr = start + i;
                    endCurr = end + i;


                    kGram = token.substring(startCurr, endCurr);

                    postings = index.get(kGram);
                KGramPostingsEntry newEntry = new KGramPostingsEntry(lastTermID);
                    // if this k-gram has been seen before
                    if (postings != null) {
                        // Add this token to the existing postingsList.
                        // We can be sure that the list doesn't contain the token
                        // already, else we would previously have terminated the 
                        // execution of this function.
                        int lastTermInPostings = postings.get(postings.size()-1).tokenID;
                        if (lastTermID == lastTermInPostings) {
                            continue;
                        }
                        postings.add(newEntry);
                        index.put(kGram, postings);
                    }
                    // if this k-gram has not been seen before 
                    else {
                        ArrayList<KGramPostingsEntry> newList = new ArrayList<KGramPostingsEntry>();
                        newList.add(newEntry);
                        index.put(kGram, newList);
                    }
                }
            }

    }

    /** Get postings for the given k-gram */
    public List<KGramPostingsEntry> getPostings(String kgram) {
        return index.get(kgram);
    }

    /** Get id of a term */
    public Integer getIDByTerm(String term) {
        return term2id.get(term);
    }

    /** Get a term by the given id */
    public String getTermByID(Integer id) {
        return id2term.get(id);
    }

    private static HashMap<String,String> decodeArgs( String[] args ) {
        HashMap<String,String> decodedArgs = new HashMap<String,String>();
        int i=0, j=0;
        while ( i < args.length ) {
            if ( "-p".equals( args[i] )) {
                i++;
                if ( i < args.length ) {
                    decodedArgs.put("patterns_file", args[i++]);
                }
            }
            else if ( "-f".equals( args[i] )) {
                i++;
                if ( i < args.length ) {
                    decodedArgs.put("file", args[i++]);
                }
            }
            else if ( "-k".equals( args[i] )) {
                i++;
                if ( i < args.length ) {
                    decodedArgs.put("k", args[i++]);
                }
            }
            else if ( "-kg".equals( args[i] )) {
                i++;
                if ( i < args.length ) {
                    decodedArgs.put("kgram", args[i++]);
                }
            }
            else {
                System.err.println( "Unknown option: " + args[i] );
                break;
            }
        }
        return decodedArgs;
    }

    public static void main(String[] arguments) throws FileNotFoundException, IOException {
        HashMap<String,String> args = decodeArgs(arguments);

        int k = Integer.parseInt(args.getOrDefault("k", "3"));
        KGramIndex kgIndex = new KGramIndex(k);

        File f = new File(args.get("file"));
        Reader reader = new InputStreamReader( new FileInputStream(f), StandardCharsets.UTF_8 );
        Tokenizer tok = new Tokenizer( reader, true, false, true, args.get("patterns_file") );
        while ( tok.hasMoreTokens() ) {
            String token = tok.nextToken();
            kgIndex.insert(token);
        }
        System.out.printf("Done with indexing. %d k-grams in index\n", kgIndex.size());

        String[] kgrams = args.get("kgram").split(" ");
        List<KGramPostingsEntry> postings = null;
        for (String kgram : kgrams) {
            if (kgram.length() != k) {
                System.err.println("Cannot search k-gram index: " + kgram.length() + "-gram provided instead of " + k + "-gram");
                System.exit(1);
            }

            if (postings == null) {
                    System.out.println("getting postings...");
                postings = kgIndex.getPostings(kgram);
            } else {
                postings = kgIndex.intersect(postings, kgIndex.getPostings(kgram));
            }
        }
        if (postings == null) {
            System.err.println("Found 0 posting(s)");
        } else {
            int resNum = postings.size();
            System.err.println("Found " + resNum + " posting(s)");

            if (resNum > 10) {
                System.err.println("The first 10 of them are:");
                resNum = 10;
            }
            for (int i = 0; i < resNum; i++) {
                System.err.println(kgIndex.getTermByID(postings.get(i).tokenID));
                System.out.println();
            }

        }
    }
}

Tokenizer.java:

package ir;

import java.io.Reader;
import java.io.IOException;
import java.io.BufferedReader;
import java.io.FileReader;
import java.util.List;
import java.util.Arrays;
import java.util.ArrayList;
import java.util.regex.Pattern;
import java.util.regex.Matcher;
import java.util.regex.PatternSyntaxException;
import java.lang.System;



/** 
 *  This class performs tokenization of UTF-8 encoded text files. 
 */
public class Tokenizer {

    /**
     *  This flag should be set to 'true' if all letters should be
     *  turned into lowercase.
     */
    public boolean case_folding = true; 

    /**
     *  This flag should be set to 'true' if all diacritics (accents etc.)
     *  should be removed.
     */
    public boolean remove_diacritics = true; 

    /**
     *  This flag should be set to 'true' if all punctuation (full stops etc.)
     *  should be removed.
     */
    public boolean remove_punctuation = true; 

    /** 
     *  The size of the buffer should be considerably larger than
     *  the anticipated length of the longest token.
     */
    public static final int BUFFER_LENGTH = 100001;

    /** The reader from where tokens are read. */
    Reader reader;

    /** 
     *  Characters are read @code{BUFFER_LENGTH} characters at a
     *  time into @code{buf}.
     */
    char[] buf = new char[BUFFER_LENGTH];

    /** The current position in the buffer. */
    int ptr = 0;

    /** Starting position of current token, or -1 if we're between tokens. */
    int token_start = -1;

    /** The next tokens to emit. */
    ArrayList<String> token_queue = new ArrayList<String>();

    /** @code{true} if we've started reading tokens. */
    boolean started_reading = false;

    /** The patterns matching non-standard words (e-mail addresses, etc.) */
    ArrayList<Pattern> patterns = null;

    /** Special characters (with diacritics) can be translated into these characters. */
    public static final char[] SPECIAL_CHAR_MAPPING = {
    'A', 'A', 'A', 'A', 'A', 'A', 'E', 'C', 'E', 'E', 'E', 'E', 'I', 'I', 'I', 'I', 'D', 'N', 'O', 'O', 'O', 'O', 'O', '*', 'O', 'U', 'U', 'U', 'U', 'Y', 'T', 'S', 'a', 'a', 'a', 'a', 'a', 'a', 'e', 'c', 'e', 'e', 'e', 'e', 'i', 'i', 'i', 'i', 'd', 'n', 'o', 'o', 'o', 'o', 'o', '/', 'o', 'u', 'u', 'u', 'u', 'y', 't', 'y', 'A', 'a', 'A', 'a', 'A', 'a', 'C', 'c', 'C', 'c', 'C', 'c', 'C', 'c', 'D', 'd', 'E', 'e', 'E', 'e', 'E', 'e', 'E', 'e', 'E', 'e', 'G', 'g', 'G', 'g', 'G', 'g', 'G', 'g', 'H', 'h', 'H', 'h', 'I', 'i', 'I', 'i', 'I', 'i', 'I', 'i', 'I', 'i', 'J', 'j', 'J', 'j', 'K', 'k', 'k', 'L', 'l', 'L', 'l', 'L', 'l', 'L', 'l', 'L', 'l', 'N', 'n', 'N', 'n', 'N', 'n', 'n', 'N', 'n', 'O', 'o', 'O', 'o', 'O', 'o', 'O', 'o', 'R', 'r', 'R', 'r', 'R', 'r', 'S', 's', 'S', 's', 'S', 's', 'S', 's', 'T', 't', 'T', 't', 'T', 't', 'U', 'u', 'U', 'u', 'U', 'u', 'U', 'u', 'U', 'u', 'U', 'u', 'W', 'w', 'Y', 'y', 'Y', 'Z', 'z', 'Z', 'z', 'Z', 'z' }; 


    /* ------------------------------ */


    /**
     *  Constructor
     *  @param reader The reader from which to read the text to be tokenized. 
     *  @param case_folding Should be set to <code>true</code> if every character
     *         should be translated into its lowercase counterpart.
     *  @param remove_diacritics Should be set to <code>true</code> if diacritics 
     *         should be removed (e.g. é will be e).
     *  @param remove_punctuation Should be set to <code>true</code> if punctuation 
     *         should be removed (useful in some applications).
     *  @param pattern_file The name of the file containing regular expressions
     *         for non-standard words (like dates, mail addresses, etc.).
     */
    public Tokenizer( Reader reader, boolean case_folding, boolean remove_diacritics, boolean remove_punctuation, String pattern_file ) {
    this.reader = reader;
    this.case_folding = case_folding;
    this.remove_diacritics = remove_diacritics;
    this.remove_punctuation = remove_punctuation;
    if ( pattern_file != null ) {
        readPatterns( pattern_file );
    }
    }


    /** 
     *  Returns true if the character is a punctuation character.
     */
    public boolean punctuation( char c ) {
    if ( c >= 32 && c <= 47 )
        return true;
    if ( c >= 58 && c <= 64 )
        return true;
    if ( c >= 91 && c <= 96 )
        return true;
    if ( c >= 123 && c <= 126 ) 
        return true;
    return false;
    }


    /**
     *  Read the patterns that match non-standard words  
     */
    private void readPatterns( String filename ) {
    patterns = new ArrayList<Pattern>();
    String line = null;
    try {
        BufferedReader in = new BufferedReader( new FileReader( filename ));
        while (( line = in.readLine()) != null ) {
        line = line.trim();
        if ( !line.startsWith( "//" ) && line.length() > 0 ) {
            patterns.add( Pattern.compile( line ));
        }
        }
    }
    catch ( IOException e ) {
        System.err.println( "Warning: IOException reading the regular expressions from file" );
    }
    catch ( PatternSyntaxException e ) {
        System.err.println( "ERROR: Malformed regular expression: " + line );
    }
    }


    /** 
     *  Normalizes letters by converting to lower-case and possibly
     *  removing diacritics. This method is also used for checking
     *  whether a character can occur in a token or not.
     *
     *  @return code{true} if the (normalized counterpart of the) character
     *   can occur within a token, and @code{false} otherwise.
     */
    public boolean normalize( char[] buf, int ptr ) {
    char c = buf[ptr];
    if ( Character.isLetter( c )) {
        if ( remove_diacritics ) {
        // Remove diacritics by mapping to the closest character 
        // without diacritics.
        if ( c >= '\u00c0' && c <= '\u017e' ) {
            buf[ptr] = SPECIAL_CHAR_MAPPING[(int)(c-'\u00c0')];
        }
        }
        if ( case_folding ) {
        buf[ptr] = Character.toLowerCase( buf[ptr] );
        }
        return true;
    }
    if ( c >= '!' && c <= '~' ) {
        return true;
    }
    // This is not a character that can occur in a token.
    return false;
    }



    /**
     *  @return the @code{true} if there are more tokens to be
     *  read, and @code{false} otherwise.
     */
    public boolean hasMoreTokens() throws IOException {
    if ( !started_reading ) {
        readTokens();
        started_reading = true;
    }
    if ( token_queue.size() == 0 ) 
        return readTokens();
    else 
        return true;
    }


    /**
     *  @return a String containing the next token, or @code{null} if there
     *  are no more tokens.
     */
    public String nextToken() throws IOException { 
    if ( token_queue.size() == 0 ) {
        if ( readTokens() )
        return token_queue.remove( 0 );
        else
        return null;
    }
    else {
        return token_queue.remove( 0 );
    }
    }


    /**
     *  Reads the next token. 
     */ 
    private boolean readTokens() throws IOException {
    if ( !started_reading ) {
        refillBuffer( 0 );
        started_reading = true;
    }
    boolean token_added_to_queue = false;
    while ( buf[ptr] != 0 ) {
        if ( token_start < 0 ) {
        if ( normalize( buf, ptr )) {
            // A token starts here
            token_start = ptr;
        }
        ptr++;
        }
        else {
        if ( normalize( buf, ptr )) {
            // We're in the middle of a token
            ptr++;
        }
        else {
            // Check for non-standard words
            token_added_to_queue = addTokensToQueue();
            token_start = -1;
            ptr++;
        }
        }
        if ( ptr == BUFFER_LENGTH ) {
        // The buffer has been read, so refill it
        if ( token_start >= 0 ) {
            // We're in the middle of a token. Copy the parts
            // of the token we have read already into the 
            // beginning of the buffer.
            System.arraycopy( buf, token_start, buf, 0, BUFFER_LENGTH-token_start );
            refillBuffer( BUFFER_LENGTH-token_start );
            ptr = BUFFER_LENGTH-token_start;
            token_start = 0;
        }
        else {
            refillBuffer( 0 );
            ptr = 0;
        }
        }
        if ( token_added_to_queue ) {
        return true;
        }
    }
    // We have reached end of input. 
    return false; 
    }


    /**
     *  Adds token to the queue
     */
    private boolean addTokensToQueue() {
    if ( token_start < 0 ) {
        return false;
    }
    String s = new String( buf, token_start, ptr-token_start );
    if ( patterns != null ) {
        // Now let's see if the string s matches one of the patterns 
        // for non-standard words
        for ( Pattern p : patterns ) {
        Matcher m = p.matcher( s );
        if ( m.find() ) {
            // The string contains a non-standard word. First check the prefix 
            // before the matching substring, then add the non-standard word  
            // to the token queue, then check the remainder of the string.
            addStandardTokensToQueue( s.substring(0, m.start() ));
            token_queue.add( m.group() );
            token_start += m.end();
            addTokensToQueue();
            return true;
        }
        }
    }
    // This string contains only standard words
    return addStandardTokensToQueue( s );
    }


    /**
     *  Adds standard tokens (i.e. tokens not matching any regular
     *  expression) to the queue.
     */
    private boolean addStandardTokensToQueue( String s ) {
    // This string s does not match any specific pattern.
    // Then split it, considering all punctuation symbols
    // to be separators.
    boolean tokens_found = false;
    StringBuffer smallbuf = new StringBuffer();
    for ( int i=0; i<s.length(); i++ ) {
        if ( punctuation( s.charAt( i ))) {
        // The string before the punctuation sign is a token
        // unless it is empty
        String t = smallbuf.toString();
        if ( t.length()>0 ) {
            token_queue.add( t );
            smallbuf = new StringBuffer();
            tokens_found = true;
        }
        if ( !remove_punctuation ) {
            token_queue.add( "" + s.charAt( i ));
            tokens_found = true;
        }
        }
        else {
        smallbuf.append( s.charAt( i ));
        }
    }
    // The string after the last punctuation sign is a token
    // unless it is empty
    String t = smallbuf.toString();
    if ( t.length()>0 ) {
        token_queue.add( t );
        tokens_found = true;
    }   
    return tokens_found;
    }


    /**
     *  Refills the buffer and adds end_of_file "\0" at the appropriate place.
     */
    private void refillBuffer( int start ) throws IOException {
    int chars_read = reader.read( buf, start, BUFFER_LENGTH-start );
    if ( chars_read >= 0 && chars_read < BUFFER_LENGTH-start ) {
        buf[chars_read] = 0;
    }
    }

}

2 个答案:

答案 0 :(得分:0)

在finally块

中的两个方法中关闭fileinputstream

答案 1 :(得分:0)

当第一个测试用例testGetTerms打开要读取的文件时。由于某些安全原因,第二个测试用例无法打开文件。这就是它抛出fileNotFound异常的原因。根据oracle文档如下:

**FileInputStream**
public FileInputStream(String name)
                throws FileNotFoundException
Creates a FileInputStream by opening a connection to an actual file, the file named by the path name name in the file system. 

创建一个新的FileDescriptor对象来表示此文件连接。 首先,如果有安全管理器,则调用其checkRead方法,并将name参数作为其参数。

If the named file does not exist, is a directory rather than a regular file, or for some other reason cannot be opened for reading then a `FileNotFoundException` is thrown.

Parameters:
name - the system-dependent file name.
Throws:
FileNotFoundException - if the file does not exist, is a directory rather than a regular file, or for some other reason cannot be opened for reading.
SecurityException - if a security manager exists and its checkRead method denies read access to the file.
See Also:
SecurityManager.checkRead(java.lang.String)