Question

ActiveSheet

假设我的刺痛在文本文件中如下

第1行：

import shlex
fil=open("./demoshlex.txt",'r')
line=fil.readline()
print line
print shlex.split(line)

我想按如下方式拆分行和表单列表

asfdsafadfa "Tabvxc "avcx"sdasaf" sadasfdf. sdsadsaf '0000000000000000000000000000000'." is something

我尝试使用[asfdsafadfa, "Tabvxc "avcx"sdasaf" sadasfdf. sdsadsaf '0000000000000000000000000000000'.", is something]，但它给了我异常，放置代码和异常

shlex.split

Answer 1

在我看来，您只想在第一次出现"时进行拆分，并希望将所有"保留在输出列表的第二个元素中。

以下是仅使用标准库的示例，无需导入：

result = []
with open('test.txt', 'r') as openfile:
    for line in openfile:
        # strip spaces and \n from the line
        line = line.strip()
        # split the line on "
        my_list = line.split('"')
        # only append first element of the list to the result
        result.append(my_list[0].strip())
        # rebuild the second part, adding back in the "
        remainder = '"' + '"'.join([a for a in my_list[1:]])
        # append the second part to the result
        result.append(remainder)
print(result)

输出：

['asfdsafadfa', '"Tabvxc "avcx"sdasaf" sadasfdf. sdsadsaf \'0000000000000000000000000000000\'."']

或者如果您打印输出列表的各个元素：

for e in result:
    print(e)

输出：

asfdsafadfa
"Tabvxc "avcx"sdasaf" sadasfdf. sdsadsaf '0000000000000000000000000000000'."

[根据评论编辑]

根据评论，您可以使用.split('"', 1)，例如：

with open('test.txt', 'r') as openfile:
    for line in openfile:
        # strip spaces and \n from the line
        line = line.strip()
        # split the line on " but only the fist one
        result = line.split('"', 1)
        # add in the " for the second element
        result[1] = '"' + result[1]

[根据更新的问题和评论进行编辑]

OP的评论：

我只想要引用的部分，即删除＆＃34;是＆＃34;从那以后结果列表的元素并使其成为[2]元素

因为问题是用尾随更新＆＃34;是＆＃34;输入上的字符串，需要在输出中省略，现在示例如下：

with open('test.txt', 'r') as openfile:
    for line in openfile:
        # strip spaces and \n from the line
        line = line.strip()
        # split the line on " but only the fist one
        result = line.split('"', 1)
        # add in the " for the second element, remove trailing string
        result[1] = '"{}"'.format(result[1].rsplit('"', 1)[0])

但是文件可能包含多行，如果是这种情况，则需要建立一个输出列表，每行一个输出。现在的例子如下：

result = []
with open('test.txt', 'r') as openfile:
    for line in openfile:
        if '"' in line:
            # we can split the line on "
            line = line.strip().split('"', 1)
            if line[1][-1] == '"':
                # no trailing string to remove
                # pre-fix second element with "
                line[1] = '"{}'.format(line[1])
            elif '"' in line[1]:
                # trailing string to be removed with .rsplit()[0]
                # post- and pre-fix " for second element 
                line[1] = '"{}"'.format(line[1].rsplit('"', 1)[0])
        else:
            # no " in line, return line as one element list
            line = [line.strip()]
        result.append(line)

# result is now a list of lists
for line in result:
    for e in line:
        print(e)

Answer 2

最好的方法是使用re

s = '''asfdsafadfa "Tabvxc "avcx"sdasaf" sadasfdf. sdsadsaf '0000000000000000000000000000000'." is something'''''

pat = re.compile(
    r'''
    ^      # beginning of a line
    (.*?)  # first part. the *? means non-greedy
    (".*") # part between the outermost ", ("-included)
    (.*?)  # last part
    $      # end of a line
    ''', re.DOTALL|re.VERBOSE)

pat.match(s).groups()

('asfdsafadfa ',
 '"Tabvxc "avcx"sdasaf" sadasfdf. sdsadsaf \'0000000000000000000000000000000\'."',
 ' is something')

总而言之，这将成为：

test_str = '''asfdsafadfa "Tabvxc "avcx"sdasaf" sadasfdf. sdsadsaf '0000000000000000000000000000000'." is something
asfdsafadfa "Tabvxc "avcx"sdasaf" sadasfdf. sdsadsaf '0000000000000000000000000000000'."
asfdsafadfa Tabvxc avcxsdasaf sadasfdf. sdsadsaf '0000000000000000000000000000000'.
'''
def split_lines(filehandle):
    pat = re.compile(r'''^(.*?)(".*")(.*?)$''', re.DOTALL)
    for line in filehandle:
        match = pat.match(line)
        if match:
            yield match.groups()
        else:
            yield line

with StringIO(test_str) as openfile:
    for line in split_lines(openfile):
        print(line)

第一个生成器将打开的文件句柄拆分为不同的行。然后它试图分割线。如果成功，它会产生一个包含不同部分的元组，否则会产生原始字符串。

在实际的程序中，您可以将StringIO(test_str)替换为open(filename, 'r')

('asfdsafadfa ', '"Tabvxc "avcx"sdasaf" sadasfdf. sdsadsaf \'0000000000000000000000000000000\'."', ' is something')
('asfdsafadfa ', '"Tabvxc "avcx"sdasaf" sadasfdf. sdsadsaf \'0000000000000000000000000000000\'."', '')
asfdsafadfa Tabvxc avcxsdasaf sadasfdf. sdsadsaf '0000000000000000000000000000000'.

Answer 3

您的原始字符串似乎很难引用。您可以通过在它们之前添加引号来转义引号，如下所示：

my_var = "Tabvxc \"avcx\"sdasaf\" sadasfdf. sdsadsaf '0000000000000000000000000000000'."

然后您可以按照以下方式进行拆分：

my_var.split('"')

在python中以双引号分割

3 个答案: