字符串标记符

时间:2010-01-10 20:08:19

标签: java string comments tokenize

有人可以通过在代码中添加一些注释来帮助我理解这个字符串标记器的工作原理吗?我非常感谢任何帮助,谢谢!

public String[] split(String toSplit, char delim, boolean ignoreEmpty) {

    StringBuffer buffer = new StringBuffer();
    Stack stringStack = new Stack();

    for (int i = 0; i < toSplit.length(); i++) {
        if (toSplit.charAt(i) != delim) {
            buffer.append((char) toSplit.charAt(i));
        } else {
            if (buffer.toString().trim().length() == 0 && ignoreEmpty) {
            } else {
                stringStack.addElement(buffer.toString());
            }
            buffer = new StringBuffer();
        }
    }

    if (buffer.length() !=0) {
        stringStack.addElement(buffer.toString());
    }

    String[] split = new String[stringStack.size()];
    for (int i = 0; i < split.length; i++) {
        split[split.length - 1 - i] = (String) stringStack.pop();
    }

    stringStack = null;
    buffer = null;

//        System.out.println("There are " + split.length + " Words");
    return split;
}

6 个答案:

答案 0 :(得分:3)

不是世界上最好的书面方法!但是下面的评论。总的来说,它的作用是将字符串拆分为“单词”,使用字符delim来分隔它们。如果ignoreEmpty为真,则不计算空单词(即两个连续的分隔符作为一个)。

public String[] split(String toSplit, char delim, boolean ignoreEmpty) {

    // Buffer to construct words
    StringBuffer buffer = new StringBuffer();
    // Stack to store complete words
    Stack stringStack = new Stack();

    // Go through input string one character at a time
    for (int i = 0; i < toSplit.length(); i++) {
        // If next character is not the delimiter,
        // add it to the buffer
        if (toSplit.charAt(i) != delim) {
            buffer.append((char) toSplit.charAt(i));
        // Else it is the delimiter, so process the
        // complete word
        } else {
            // If the word is empty (0 characters) we
            // have the choice of ignoring it
            if (buffer.toString().trim().length() == 0 && ignoreEmpty) {
            // Otherwise, we push it onto the stack
            } else {
                stringStack.addElement(buffer.toString());
            }
            // Clear the buffer ready for the next word
            buffer = new StringBuffer();
        }
    }

    // If there are remaining characters in the buffer,
    // then a word rather than the delimiter ends the
    // string, so we push that onto the stack as well
    if (buffer.length() !=0) {
        stringStack.addElement(buffer.toString());
    }

    // We set up a new array to store the contents of
    // the stack
    String[] split = new String[stringStack.size()];

    // Then we pop each element from the stack into an
    // indexed position in the array, starting at the
    // end as the last word was last on the stack
    for (int i = 0; i < split.length; i++) {
        split[split.length - 1 - i] = (String) stringStack.pop();
    }

    stringStack = null;
    buffer = null;

    // Then return the array
//        System.out.println("There are " + split.length + " Words");
    return split;
}

您可以使用string.split方法编写更高效的方法,将分隔符转换为合适的正则表达式(如果+为真,则以ignoreEmpty结尾)。

答案 1 :(得分:1)

public String[] split(String toSplit, char delim, boolean ignoreEmpty) {

    // Holds each character efficiently while parsing the string
    // in a temporary buffer
    StringBuffer buffer = new StringBuffer();
    // Collection for holding the intermediate result
    Stack stringStack = new Stack();

    // for each character in the string to split
    for (int i = 0; i < toSplit.length(); i++) 
    {
        // if the character is NOT the delimeter
        if (toSplit.charAt(i) != delim) 
        {
            // add this character to the temporary buffer
            buffer.append((char) toSplit.charAt(i));
        } else { // we are at a delimeter!
            // if the buffer is empty and we are ignoring empty
            if (buffer.toString().trim().length() == 0 && ignoreEmpty) {
              // do nothing
            } else { // if the buffer is not empty or if ignoreEmpty is not true
                // add the buffer to the intermediate result collection and
                stringStack.addElement(buffer.toString());
            }
            // reset the buffer 
            buffer = new StringBuffer();
        }

    }
    // we might have extra characters left in the buffer from the last loop
    // if so, add it to the intermediate result
    // IMHO, this might contain a bug
    // what happens when the buffer contains a space at the end and 
    // ignoreEmpty is true?  Seems like it would still be added
    if (buffer.length() !=0) {
        stringStack.addElement(buffer.toString());
    }
    // we are going to convert the intermediate result to an array
    // we create a result array the size of the stack
    String[] split = new String[stringStack.size()];
    // and each item in the stack to the return array
    for (int i = 0; i < split.length; i++) {
        split[split.length - 1 - i] = (String) stringStack.pop();
    }

    // release our temp vars
    // (to let the GC collect at the earliest possible moment)
    stringStack = null;
    buffer = null;

    // and return it
    return split;
}

这是直接来自String.Split还是其他的东西?因为在我看来代码中有一个错误(如果在IgnoreEmpty为真的情况下,如果留在最后,则会添加空结果)?

答案 2 :(得分:0)

此代码循环一个字符串,通过查找分隔符将其拆分为单词并返回包含所有找到的单词的字符串数组。

在C#中,您可以编写相同的代码:

toSplit.Split(
    new char[]{ delim }, !ignoreEmpty ? 
        StringSplitOptions.None:
        StringSplitOptions.RemoveEmptyEntries);

答案 3 :(得分:0)

public String[] split(String toSplit, char delim, boolean ignoreEmpty) { 

    StringBuffer buffer = new StringBuffer(); //Make a StringBuffer
    Stack stringStack = new Stack();          //Make a set of elements, a stack

    for (int i = 0; i < toSplit.length(); i++) { //For how many characters are in the string, run this loop
        if (toSplit.charAt(i) != delim) { //If the current character (while in the loop, is NOT equal to the specified delimiter (passed into the function), add it to a buffer
            buffer.append((char) toSplit.charAt(i));
        } else { //Otherwise...
            if (buffer.toString().trim().length() == 0 && ignoreEmpty) { //If it's whitespace do nothing (only if ignoreempty is true
            } else { //otherwise...
                stringStack.addElement(buffer.toString()); //Add the previously found characters to the output stack
            }
            buffer = new StringBuffer(); //Make another buffer.
        }
    }

    if (buffer.length() !=0) { //If nothing was added
        stringStack.addElement(buffer.toString()); //Add the whole String
    }

    String[] split = new String[stringStack.size()]; //Split
    for (int i = 0; i < split.length; i++) {
        split[split.length - 1 - i] = (String) stringStack.pop();
    }

    stringStack = null;
    buffer = null;

//        System.out.println("There are " + split.length + " Words");
    return split;
}

答案 4 :(得分:0)

这段代码根据给定的分隔符将字符串拆分为子字符串。例如,字符串:

String str = "foo,bar,foobar";
String[] strArray = split(str, ',' true);

将作为此字符串数组返回:

strArray ==> [ "foo", "bar", "foobar" ];


public String[] split(String toSplit, char delim, boolean ignoreEmpty) {

    StringBuffer buffer = new StringBuffer();
    Stack stringStack = new Stack();

    // Loop through each char in the string (so 'f', then 'o', then 'o' etc).
    for (int i = 0; i < toSplit.length(); i++) {
        if (toSplit.charAt(i) != delim) {
            // If the char at the current position in the string does not equal 
            // the delimiter, add this char to the string buffer (so we're 
            // building up another string that consists of letters between two 
            // of the 'delim' characters).
            buffer.append((char) toSplit.charAt(i));
        } else {
            // If the string is just whitespace or has length 0 and we are 
            // removing empty strings, do not include this substring
            if (buffer.toString().trim().length() == 0 && ignoreEmpty) {
            } else {
                // It's not empty, add this substring to a stack of substrings.
                stringStack.addElement(buffer.toString());
            }
            // Reset the buffer for the next substring.
            buffer = new StringBuffer();
        }
    }

    if (buffer.length() !=0) {
        // Make sure to add the last buffer/substring to the stack!
        stringStack.addElement(buffer.toString());
    }

    // Make an array of string the size of the stack (the number of substrings found)
    String[] split = new String[stringStack.size()];
    for (int i = 0; i < split.length; i++) {
        // Pop off each substring we found and add it into the array we are returning.
        // Fill up the array backwards, as we are taking values off a stack.
        split[split.length - 1 - i] = (String) stringStack.pop();
    }

    // Unnecessary, but clears the variables
    stringStack = null;
    buffer = null;

//        System.out.println("There are " + split.length + " Words");
    return split;
}

答案 5 :(得分:0)

好的,在继续回答之前,我应该指出此代码存在多个问题。这是:

/**
*
*/
public String[] split(   
    String toSplit       //string to split in tokens, delimited by delim
,   char delim           //character that delimits tokens
,   boolean ignoreEmpty  //if true, tokens consisting of only whitespace are ignored
) {

StringBuffer buffer = new StringBuffer();
Stack stringStack = new Stack();

for (int i = 0; i < toSplit.length(); i++) {     //examine each character
    if (toSplit.charAt(i) != delim) {            //no delimiter: this char is part of a token, so add it to the current (partial) token.
        buffer.append((char) toSplit.charAt(i)); 
    } else {
        if (buffer.toString().trim().length() == 0 && ignoreEmpty) {   //'token' consists only of whitespace, and ignoreEmpty was set: do nothing
        } else {
            stringStack.addElement(buffer.toString());  //found a token, so save it.
        }
        buffer = new StringBuffer();                    //reset the buffer so we can store the next token.
    }
}

if (buffer.length() !=0) {                              //save the last (partial) token (if it contains at least one character)
    stringStack.addElement(buffer.toString());
}

String[] split = new String[stringStack.size()];        //copy the stack of tokens to an array
for (int i = 0; i < split.length; i++) {
    split[split.length - 1 - i] = (String) stringStack.pop();
}

stringStack = null;                                     //uhm?...
buffer = null;

//        System.out.println("There are " + split.length + " Words");
return split;                                           //return the array of tokens.

}

问题:

  1. 有一个非常好的buuilt-in字符串标记器,java.util.StringTokenizer
  2. 代码为每个令牌分配一个新的StringBuffer!它应该只是重置StringBuffer
  3. 的长度
  4. 循环中的嵌套if可以更有效地编写,至少更具可读性
  5. 将标记复制到数组以便返回。任何调用者都应该只是满足于传递一些可以迭代的结构。如果需要数组,可以将其复制到此函数之外。这可以节省可考虑的内存和CPU资源
  6. 可能只需使用内置的java.util.StringTokenizerr

    即可解决所有问题