如何在python中拆分像shell一样的字符串?

时间:2017-07-06 10:03:13

标签: python argparse

我在字符串中有命令行参数,我需要将其拆分为argparse.ArgumentParser.parse_args

我看到the documentation充分利用string.split()。但是在复杂的情况下,这不起作用,例如

--foo "spaces in brakets"  --bar escaped\ spaces

在python中有没有这样做的功能?

java的类似问题被问到here )。

3 个答案:

答案 0 :(得分:10)

答案 1 :(得分:1)

如果您正在解析Windows风格的命令行,则shlex.split无法正常工作-在结果上调用subprocess函数的行为与将字符串直接传递给外壳。

在那种情况下,最简单的将字符串作为命令行参数分割的方法是...将命令行参数传递给python:

import sys
import subprocess
import shlex
import json  # json is an easy way to send arbitrary ascii-safe lists of strings out of python

def shell_split(cmd):
    """
    Like `shlex.split`, but uses the Windows splitting syntax when run on Windows.

    On windows, this is the inverse of subprocess.list2cmdline
    """
    if os.name == 'posix':
        return shlex.split(cmd)
    else:
        # TODO: write a version of this that doesn't invoke a subprocess
        if not cmd:
            return []
        full_cmd = '{} {}'.format(
            subprocess.list2cmdline([
                sys.executable, '-c',
                'import sys, json; print(json.dumps(sys.argv[1:]))'
            ]), cmd
        )
        ret = subprocess.check_output(full_cmd).decode()
        return json.loads(ret)

这些差异的一个示例:

# windows does not treat all backslashes as escapes
>>> shell_split(r'C:\Users\me\some_file.txt "file with spaces"', 'file with spaces')
['C:\\Users\\me\\some_file.txt', 'file with spaces']

# posix does
>>> shlex.split(r'C:\Users\me\some_file.txt "file with spaces"')
['C:Usersmesome_file.txt', 'file with spaces']

# non-posix does not mean Windows - this produces extra quotes
>>> shlex.split(r'C:\Users\me\some_file.txt "file with spaces"', posix=False)
['C:\\Users\\me\\some_file.txt', '"file with spaces"']  

答案 2 :(得分:0)

您可以使用split_arg_string包中的click辅助函数:

import re

def split_arg_string(string):
    """Given an argument string this attempts to split it into small parts."""
    rv = []
    for match in re.finditer(r"('([^'\\]*(?:\\.[^'\\]*)*)'"
                             r'|"([^"\\]*(?:\\.[^"\\]*)*)"'
                             r'|\S+)\s*', string, re.S):
        arg = match.group().strip()
        if arg[:1] == arg[-1:] and arg[:1] in '"\'':
            arg = arg[1:-1].encode('ascii', 'backslashreplace') \
                .decode('unicode-escape')
        try:
            arg = type(string)(arg)
        except UnicodeError:
            pass
        rv.append(arg)
    return rv

例如:

>>> print split_arg_string('"this is a test" 1 2 "1 \\" 2"')
['this is a test', '1', '2', '1 " 2']

click包开始主导命令参数解析,但我不认为它支持从字符串解析参数(仅来自argv)。上面的辅助函数仅用于bash完成。

修改:我只能建议按照@ShadowRanger的回答建议使用shlex.split()。我没有删除这个答案的唯一原因是因为它提供了一点点分裂然后在shlex中使用的完整的纯python标记器(示例快了大约3.5倍)以上,5.9us vs 20.5us)。但是,这不应该是优先于shlex的理由。