如何在Python中编写一个小的tokenizer?

时间:2016-07-28 09:08:08

标签: python parsing interpreter

通常,Python通过

调用函数
func(arg0, arg1)

但我想改为

func arg0 arg1

例如,

#Something...
cmd = input()
interpret(cmd)
#Something...

如果我输入'func arg0 arg1',那么我希望Python能够执行func(arg0, arg1)

Args将包含字符串,因此我们不能简单地拆分单词。

实际上,我想写一些脚本在我的手机上使用。因此输入括号会有点烦人。

4 个答案:

答案 0 :(得分:0)

如果没有args包含你可以做的空格

fn_args=cmd.split()
python_code="%s(%s)" % (fn[0], ", ".join(fn_args[1:]))
eval(python_code)

编辑:

如果不是那么简单,你应该看看herehttps://docs.python.org/3/library/cmd.html,但这些需要做一些准备才能执行任意代码

EDIT2:

如果你不需要你的args是精确的python,你可以将它们解析为json https://docs.python.org/3/library/argparse.html

你可以这样做

import json
cmd='fname "a" "b" 1'
fn,sep,args=cmd.strip().partition(" ")
end=0
largs=[]
d=json.JSONDecoder()
while end < len(args):
    args=args[end:].strip()
    arg,end=d.raw_decode(args)
    largs.append(arg)
exec(fn)(*largs) # better look into exec docs

答案 1 :(得分:0)

你可以这样做:

class  tryClass:
    def callFunction(self, arg, arg2):
        print("In call")
        print(arg)
        print(arg2)

input = str(input())
input = input.split(" ")
funcName = input[0]
my_cls = tryClass()

method = getattr(my_cls, funcName)
method(input[1], input[2])

如果我输入callFunction hello world,它可以工作:)

答案 2 :(得分:0)

我想要的只是一个简单的标记器。我想通过调用eval()来运行函数。这就是我为我的项目所做的。

结果如下:

>>> tokenizer('func 123 abc')
[('func', 'func'), ('arg', '123'), ('arg', 'abc')]
>>> tokenizer('func 123.5 abc')
[('func', 'func'), ('arg', '123.5'), ('arg', 'abc')]
>>> tokenizer('func 123.5 abc "Hello, World!"')
[('func', 'func'), ('arg', '123.5'), ('arg', 'abc'), ('arg', 'Hello, World!')]
>>> tokenizer("func 123.5 abc 'Hello, World!'")
[('func', 'func'), ('arg', '123.5'), ('arg', 'abc'), ('arg', 'Hello, World!')]

Attentsion:这可能不适合所有人,这不是一个完整的解析器或标记器。

代码:

def isNumber(cmd):
    try:
        int(cmd)
        return True
    except ValueError:
        try:
            float(cmd)
            return True
        except ValueError:
            return False
    return False

def isWord(cmd):
    if len(cmd) == 0:
        return False
    if cmd[0].isalpha():
        for i in cmd[1:]:
            if not i.isalpha() and i != '_' and i != '-':
                return False
        return True
    return False
def spaceParser(cmd):
    i = 0
    for i in range(len(cmd)):
        if cmd[i] == ' ':
            continue
        break
    return cmd[i:]

def funcNameParser(cmd):
    cmd = spaceParser(cmd)
    i = 0
    word = ''
    for i in range(len(cmd)):
        if cmd[i] != ' ':
            word += cmd[i]
        else:
            break
    if i + 1 > len(word):
        return (word, cmd[i:])
    return (word, cmd[i+1:])

def argumentParser(cmd):
    cmd = spaceParser(cmd)
    if cmd[0] == '\'':
        word = ''
        i = 0
        for i in range(1, len(cmd)):
            if cmd[i] != '\'':
                word += cmd[i]
            else:
                return (word, cmd[i+1:])
        assert False, 'Fatal exception: String not finished.'
    if cmd[0] == '"':
        word = ''
        i = 0
        for i in range(1, len(cmd)):
            if cmd[i] != '"':
                word += cmd[i]
            else:
                return (word, cmd[i+1:])
        assert False, 'Fatal exception: String not finished.'            
    i = 0
    word = ''
    for i in range(len(cmd)):
        if cmd[i] != ' ':
            word += cmd[i]
        else:
            break
    assert isWord(word) or isNumber(word), 'Fatal exception: Not a valid name.'
    if i + 1 > len(word):
        return (word, cmd[i:]) 
    return (word, cmd[i+1:])

def tokenizer(cmd):
    token = []
    result = funcNameParser(cmd)
    token += [('func', result[0])]
    while len(result[1]) != 0:
        result = argumentParser(result[1])
        token += [('arg', result[0])]
    return token

答案 3 :(得分:0)

内置shlex模块可能就是您想要的:

>>> import shlex
>>> cmd = "func arg0 arg1 'arg2 has spaces'"
>>> list(shlex.shlex(cmd))
['func', 'arg0', 'arg1', "'arg2 has spaces'"]

如果您可以信任输入,那么实际调用它将如下所示:

>>> tokens = list(shlex.shlex(cmd))
>>> # here is a stupid func function that reverses its input args
>>> func = lambda *args: print(*reversed(args))
>>> eval(tokens[0])(*tokens[1:])
'arg2 has spaces' arg1 arg0