我对Python中的一行感到有点困惑:
我们使用Python和自定义函数来分割一行:我们希望引号之间的内容是数组中的单个条目。
该行例如是:
"La Jolla Bank, FSB",La Jolla,CA,32423,19-Feb-10,24-Feb-10
所以“La Jolla Bank,FSB”应该是数组中的单个条目。
我不确定理解这段代码:
第一个字符是引号“"
”,因此变量“quote”设置为其反转,因此设置为“TRUE”。
然后我们检查逗号,并且如果quote被设置为反向,那么如果quote为TRUE,那么当我们在引号内时就是这种情况。
我们用current=""
剪切它,这是我不明白的地方:我们仍然在引号之间,所以通常我们现在不应该删除它!编辑:所以而不是引用意味着“假”,而不是“相反的”,谢谢!
代码:
def mysplit (string):
quote = False
retval = []
current = ""
for char in string:
if char == '"':
quote = not quote
elif char == ',' and not quote: #the first coma is still in the quotes, and quote is set to TRUE, so we should not cut current here...
retval.append(current)
current = ""
else:
current += char
retval.append(current)
return retval
答案 0 :(得分:3)
您正在查看它,就像运行if char == '"'
和elif char == ',' and not quote
一样。
然而,if语句明确地使它只有一个将运行。
要么报价将被反转,要么current
值将被削减。
在当前char为"
的情况下,将调用逻辑以反转quote
标志。但是切断字符串的逻辑不会运行。
在当前char为,
的情况下,反转该标志的逻辑将不会运行,但是如果未设置quote
标志,则剪切字符串的逻辑将会出现。
答案 1 :(得分:1)
这是将current
初始化为空字符串,消除之前可能设置的内容。
只要你不在引号内(即。quote
为False),当你看到,
时,你已经到了字段的末尾。无论您在current
累积的是该字段的内容,请将其附加到retval
并将current
重置为空字符串,为下一个字段做好准备。
那就是说,这看起来像是在处理.csv输入。有csv module可以为您解决此问题。
答案 2 :(得分:1)
当前被重置为空,因为如果你遇到','并且你不在“”引号下,你应该将其解释为“令牌”的结尾。
这绝对不是pythonic,for char in string
让我感到畏缩,编写此代码的人应该使用正则表达式。
答案 3 :(得分:1)
您所看到的是大多数语言解析程序使用的Finite State Machine的精简版本。
让我们看看我是否不能注释它:
def mysplit (string):
# We start out at the beginning of the string NOT in between quotes
quote = False
# Hold each element that we split out
retval = []
# This variable holds whatever the current item we're interested in is
# e.g: If we're in a quote, then it's everything (including commas)
# otherwise it's every UP UNTIL the next comma
current = ""
# Scan the string character by character
for char in string:
# We hit a quote, so turn on QUOTE SCANNING MODE!!!
# If we're in quote scanning mode, turn it off
if char == '"':
quote = not quote
# We hit a comma, and we're not in quote scanning mode
elif char == ',' and not quote:
# We got what we want, let's put it in the return value
# and then reset our current item to nothing so we can prepare for the next item.
retval.append(current)
current = ""
else:
# Nothing special, let's just keep building up our current item
current += char
# We're done with all the characters, let's put together whatever we were working on when we ran out of characters
retval.append(current)
# Return it!
return retval
答案 4 :(得分:1)
这不是分裂的最佳代码,但它非常直接
1 current = ""
# First you set current to empty string, the following line
# will loop through the string to be split and pull characters out of it
# one by one... setting 'char' to be the value of next character
2 for char in string:
# the following code will check if the line we are currently inside of the quote
# if otherwise it will add the current character to the the 'current' variable
#
3 if char == '"':
4 quote = not quote
5 elif char == ',' and not quote:
6 retval.append(current)
### if we see the comma, it will append whatever is accumulated in current to the
### return result.
### then you have to reset the value in the current to let the next word accumulate
7 current = "" #why do we cut current here?
8 else:
9 current += char
### after the last char is seen, we still have left over characters in current which
### we can just shove into the final result
10 retval.append(current)
11 return retval
Here is an example run:
Let string be 'a,bbb,ccc
Step char current retval
1 a a {}
2 , {a} ### Current is reset
3 b b {a}
4 b bb {a}
5 b bbb {a}
6 , {a,bbb} ### Current is reset
and so on
答案 5 :(得分:1)
好的,你不在那里!
1.第一个字符是引用 ''',所以变量“quote”设置为反转,因此设置为 “真正”。
好!所以引用被设置为以前的反转。在编程的开头,它是错误的,所以当看到"
时,它就变为真。但反之亦然,如果它是真的,并且看到一个引用,那就变得虚假了。
换句话说,程序的这一行改变了quote
与该行之前的任何内容。它被称为'切换'。
- 然后我们检查昏迷,如果引用设置为反转, 所以如果引用为TRUE,那么当我们在引号内时就是这种情况。
醇>
这不太对。 not quote
表示“仅当引用为假”时。这与'设置为反向'无关。 没有变量可以等于自己的逆变量!这就像说X=True and X=False
- 显然是胡说八道。
quote
始终是True
或False
- 而不是别的!
3.我们用current =“”来切割它,这是我不明白的地方:我们仍然在引号之间,所以我们现在不应该切断它!
所以希望你现在可以看到,如果你达到这条线,你就不在引号之间。 not quote
确保您不会在报价中切入,因为not quote
实际上只是意味着 - 不在报价中!