修改

Question

我想分割以下字符串：

数量[*，'EXTRA 05'，*]

期望的结果是：

[“数量”，“[*，'EXTRA 05'，*]”]

我发现最接近的是使用shlex.split，但这会删除内部引号，从而产生以下结果：

['数量'，'[*，EXTRA 05，*]']

任何建议都将不胜感激。

编辑：

还需要多次拆分，例如：

“数量[*，'EXTRA 05'，*] [*，'EXTRA 09'，*]”

要：

[“数量”，“[*，'EXTRA 05'，*]”，“[*，'EXTRA 09'，*]”]

Answer 1

要处理字符串，基本方法是正则表达式工具（模块re）

鉴于您提供的信息（这意味着它们可能效率不高），以下代码可以完成工作：

import re

r = re.compile('(?! )[^[]+?(?= *\[)'
               '|'
               '\[.+?\]')


s1 = "Quantity [*,'EXTRA 05',*] [*,'EXTRA 09',*]"
print r.findall(s1)
print '---------------'      

s2 = "'zug hug'Quantity boondoggle 'fish face monkey "\
     "dung' [*,'EXTRA 05',*] [*,'EXTRA 09',*]"
print r.findall(s2)

结果

['Quantity', "[*,'EXTRA 05',*]", "[*,'EXTRA 09',*]"]  
---------------
["'zug hug'Quantity boondoggle 'fish face monkey dung'", "[*,'EXTRA 05',*]", "[*,'EXTRA 09',*]"]

正则表达式模式必须如下所示：

'|'表示OR

所以正则表达式模式表示两个部分RE：
(?! )[^[]+?(?= *\[)
和
\[.+?\]

第一部分RE：

核心是[^[]+
括号定义一组字符。符号^位于第一个括号[之后，表示该集定义为所有不符合符号^的字符。
目前[^[]表示任何不是开头括号的字符[，并且，因为在此定义集后有+，[^[]+表示序列他们中的人物没有空位。

现在，[^[]+后面有一个问号：这意味着捕获的序列必须在问号后面的符号化之前停止。
在这里，?后面的(?= *\[)是一个先行断言，由(?=....)组成，表示它是一个正面的先行断言和*\[，最后一部分是必须停止捕获序列前面的序列。 *\[表示：零，一个或多个空格直到开括号（反斜杠\需要消除[作为一组字符的开头的含义）。

在核心前面还有(?! )，它是一个负面的先行断言：有必要使这个部分RE只捕获以空白开头的序列，因此避免捕获连续的空白。删除此(?! )，您将看到效果。

第二部分RE：

\[.+?\]表示：左括号 [，由.+?捕获的一系列字符（与除\n之外的任何字符匹配的点），这序列必须在结束括号字符] 前面停止，这是要捕获的最后一个字符。

修改

string = "Quantity [*,'EXTRA 05',*] [*,'EXTRA 09',*]"
import re
print re.split(' (?=\[)',string)

结果

['Quantity', "[*,'EXTRA 05',*]", "[*,'EXTRA 09',*]"]

!!

Answer 2

为挑剔的人提供建议，算法不会很好地分割你传递给它的每一根字符串，只是字符串如下：

<强> "Quantity [*,'EXTRA 05',*] [*,'EXTRA 09',*]"

<强> "Quantity [*,'EXTRA 05',*]"

<强> "Quantity [*,'EXTRA 05',*] [*,'EXTRA 10',*] [*,'EXTRA 07',*] [*,'EXTRA 09',*]"

string = "Quantity [*,'EXTRA 05',*] [*,'EXTRA 09',*]"
splitted_string = []

#This adds "Quantity" to the position 0 of splitted_string
splitted_string.append(string.split(" ")[0])     

#The for goes from 1 to the lenght of string.split(" "),increasing the x by 2
#The first iteration x is 1 and x+1 is 2, the second x=3 and x+1=4 etc...
#The first iteration concatenate "[*,'EXTRA" and "05',*]" in one string
#The second iteration concatenate "[*,'EXTRA" and "09',*]" in one string
#If the string would be bigger, it will works
for x in range(1,len(string.split(" ")),2):
    splitted_string.append("%s %s" % (string.split(" ")[x],string.split(" ")[x+1]))

当我执行代码时，末尾的分割字符串包含：

['Quantity', "[*,'EXTRA 05',*]", "[*,'EXTRA 09',*]"]
splitted_string[0] = 'Quantity'
splitted_string[1] = "[*,'EXTRA 05',*]"
splitted_string[2] = "[*,'EXTRA 09',*]"

我认为这正是您所寻找的。如果我错了，请告诉我，或者如果您需要对代码进行一些解释。我希望它有所帮助

Answer 3

假设你想要一个通用的解决方案来分隔空格而不是引用中的空格：我不知道有任何Python库可以做到这一点，但并不意味着没有一个。

在没有已知的预卷解决方案的情况下，我只想自己滚动。扫描查找空格的字符串相对容易，然后使用Python切片功能将字符串分成所需的部分。要忽略引号中的空格，您可以简单地包含一个标志，该标志会在遇到引号时打开，以打开和关闭空间感应。

这是我为此做的一些代码，它没有经过广泛测试：

def spaceSplit(string) :
  last = 0
  splits = []
  inQuote = None
  for i, letter in enumerate(string) :
    if inQuote :
      if (letter == inQuote) :
        inQuote = None
    else :
      if (letter == '"' or letter == "'") :
        inQuote = letter

    if not inQuote and letter == ' ' :
      splits.append(string[last:i])
      last = i+1

  if last < len(string) :
    splits.append(string[last:])

  return splits

Answer 4

尝试一下

def parseString(inputString):
    output = inputString.split()
    res = []
    count = 0
    temp = []
    for word in output:
        if (word.startswith('"')) and count % 2 == 0:
            temp.append(word)
            count += 1
        elif count % 2 == 1 and not word.endswith('"'):
            temp.append(word)
        elif word.endswith('"'):
            temp.append(word)
            count += 1
            tempWord = ' '.join(temp)
            res.append(tempWord)
            temp = []
        else:
            res.append(word)


    print(res)

输入：

parseString（'这是对拆分后的带引号的字符串的“测试”'）

输出： ['This'，'is'，'“ test”'，'to'，'your'，'split'，'“带引号的字符串”']

Python按空格分割字符串，除非在引号中，但保留引号

4 个答案:

第一部分RE：

第二部分RE：

修改