如何理解这段代码,在Python中拆分数组?

时间:2011-11-03 00:57:14

标签: python string

我对Python中的一行感到有点困惑:

我们使用Python和自定义函数来分割一行:我们希望引号之间的内容是数组中的单个条目。

该行例如是:

"La Jolla Bank, FSB",La Jolla,CA,32423,19-Feb-10,24-Feb-10

所以“La Jolla Bank,FSB”应该是数组中的单个条目。

我不确定理解这段代码:

  1. 第一个字符是引号“"”,因此变量“quote”设置为其反转,因此设置为“TRUE”。

  2. 然后我们检查逗号,并且如果quote被设置为反向,那么如果quote为TRUE,那么当我们在引号内时就是这种情况。

  3. 我们用current=""剪切它,这是我不明白的地方:我们仍然在引号之间,所以通常我们现在不应该删除它!编辑:所以而不是引用意味着“假”,而不是“相反的”,谢谢!

  4. 代码:

    def mysplit (string):
        quote = False
        retval = []
        current = ""
        for char in string:
            if char == '"':
                quote = not quote
            elif char == ',' and not quote: #the first coma is still in the quotes, and quote is set to TRUE, so we should not cut current here...
                retval.append(current) 
                current = "" 
            else:
                current += char
        retval.append(current)
        return retval
    

6 个答案:

答案 0 :(得分:3)

您正在查看它,就像运行if char == '"'elif char == ',' and not quote一样。

然而,if语句明确地使它只有一个将运行。

要么报价将被反转,要么current值将被削减。

在当前char为"的情况下,将调用逻辑以反转quote标志。但是切断字符串的逻辑不会运行。

在当前char为,的情况下,反转该标志的逻辑将不会运行,但是如果未设置quote标志,则剪切字符串的逻辑将会出现。

答案 1 :(得分:1)

这是将current初始化为空字符串,消除之前可能设置的内容。

只要你不在引号内(即。quote为False),当你看到,时,你已经到了字段的末尾。无论您在current累积的是该字段的内容,请将其附加到retval并将current重置为空字符串,为下一个字段做好准备。

那就是说,这看起来像是在处理.csv输入。有csv module可以为您解决此问题。

答案 2 :(得分:1)

当前被重置为空,因为如果你遇到','并且你不在“”引号下,你应该将其解释为“令牌”的结尾。

这绝对不是pythonic,for char in string让我感到畏缩,编写此代码的人应该使用正则表达式。

答案 3 :(得分:1)

您所看到的是大多数语言解析程序使用的Finite State Machine的精简版本。

让我们看看我是否不能注释它:

def mysplit (string):
    # We start out at the beginning of the string NOT in between quotes
    quote = False
    # Hold each element that we split out
    retval = []
    # This variable holds  whatever the current item we're interested in is
    # e.g: If we're in a quote, then it's everything (including commas)
    # otherwise it's every UP UNTIL the next comma
    current = ""
    # Scan the string character by character
    for char in string:
        # We hit a quote, so turn on QUOTE SCANNING MODE!!!
        # If we're in quote scanning mode, turn it off
        if char == '"':
            quote = not quote
        # We hit a comma, and we're not in quote scanning mode
        elif char == ',' and not quote:
            # We got what we want, let's put it in the return value
            # and then reset our current item to nothing so we can prepare for the next item.
            retval.append(current) 
            current = "" 
        else:
            # Nothing special, let's just keep building up our current item
            current += char
    # We're done with all the characters, let's put together whatever we were working on when we ran out of characters
    retval.append(current)
    # Return it!
    return retval

答案 4 :(得分:1)

这不是分裂的最佳代码,但它非常直接

   1 current = ""

   # First you set current to empty string, the following line
   # will loop through the string to be split and pull characters out of it
   # one by one... setting 'char' to be the value of next character

   2 for char in string:

   # the following code will check if the line we are currently inside of the quote
   # if otherwise it will add the current character to the the 'current' variable
   # 

   3     if char == '"':
   4         quote = not quote
   5     elif char == ',' and not quote:
   6         retval.append(current) 

   ### if we see the comma, it will append whatever is accumulated in current to the 
   ### return result.
   ### then you have to reset the value in the current to let the next word accumulate


   7         current = "" #why do we cut current here? 
   8     else:
   9         current += char

   ### after the last char is seen, we still have left over characters in current which
   ### we can just shove into the final result

   10 retval.append(current)
   11 return retval


   Here is an example run:

   Let string be  'a,bbb,ccc

   Step  char  current   retval

    1     a      a        {}
    2     ,               {a}       ### Current is reset
    3     b      b        {a}
    4     b      bb       {a} 
    5     b      bbb      {a}
    6     ,               {a,bbb}   ### Current is reset

   and so on

答案 5 :(得分:1)

好的,你不在那里!

  

1.第一个字符是引用   ''',所以变量“quote”设置为反转,因此设置为   “真正”。

好!所以引用被设置为以前的反转。在编程的开头,它是错误的,所以当看到"时,它就变为真。但反之亦然,如果它是真的,并且看到一个引用,那就变得虚假了。

换句话说,程序的这一行改变了quote与该行之前的任何内容。它被称为'切换'。

  
      
  1. 然后我们检查昏迷,如果引用设置为反转,   所以如果引用为TRUE,那么当我们在引号内时就是这种情况。
  2.   

这不太对。 not quote表示“仅当引用为假”时。这与'设置为反向'无关。 没有变量可以等于自己的逆变量!这就像说X=True and X=False - 显然是胡说八道。

quote始终是TrueFalse - 而不是别的!

  

3.我们用current =“”来切割它,这是我不明白的地方:我们仍然在引号之间,所以我们现在不应该切断它!

所以希望你现在可以看到,如果你达到这条线,你就不在引号之间。 not quote确保您不会在报价中切入,因为not quote实际上只是意味着 - 在报价中!