Question

我是Python的新手，我遇到了问题。我有一个输入文件，其中包含如下数据：

12345    67890     afghe
abcde    23456     0abcd
34567    __fred01  45678
123.456  12345a    123.
.456     ab00cd    00ab00

通过使用正则表达式，需要解析每个文字并分类文字是字符串，整数还是浮点。代码片段如下所示：

def processtoken(token):
    #Replace the following line with your code to classify
    # the string in 'token' according to your three Regular
    # Expressions and print the appropriate message.
    print('Inside Process Token')

    match = re.search(r'(0|[1-9][0-9]*|0[oO]?[0-7]+|0[xX][0-9a-fA-F]+|0[bB][01]+)[lL]?', token)
    matchfp = re.search(r'^[0-9]+\.?[0-9]+$',token)
    if match:
        print(match.group(),'matches INT')
    elif matchfp:
        print(matchfp.group(),'matches FP')

我的问题是如何构建代码以验证传递的每个令牌的多个正则表达式条件。截至目前，如果条件未经验证，则为浮点。我想检查令牌，第一个整数正则表达式是否匹配，或者它是否匹配浮点正则表达式或匹配字符串。任何帮助将不胜感激。

Answer 1

我会按如下方式解决问题：

integer_regex = r"^...$"
float_regex = r"^...$"
string_regex = r"^...$"

def processToken(token):

    if re.search(integer_regex, token):
        print(token, 'matches INT')
    elif re.search(float_regex, token):
        print(token, 'matches FLOAT')
    elif re.search(string_regex, token):
        print(token, 'matches STR')
    else:
        print(token, 'unknown')

将您的模式填充到上面的*_regex变量中。

另请注意，您的float模式不合适，因为它也匹配int：

r'^[0-9]+\.?[0-9]+$'

由于小数点是可选的。您可能最好将模式分解为三个选项的替换，以＆＃39;开头。＆＃39;以＆＃39;结尾。＆＃39;或包含＆＃39;。＆＃39;数字之间。此外，在您的整数模式中，＆＃39;？＆＃39;在八进制部分是不正确的：

0[oO]?[0-7]+

此时我们尝试提交八进制，因此前缀不是可选的：

0[oO][0-7]+

对于十六进制和二进制文件，这是正确的。

Answer 2

拆分文本，使用函数isdigit()测试int，然后try测试float并捕获ValueError string。< / p>

for m in string.split():
    if m.isdigit():
        print(m, 'Int')
    else:
        try:
            float(m)
            print(m, 'Float')
        except ValueError:
            print(m, 'STR')

输出：

('12345', 'Int')('67890', 'Int')('afghe', 'STR')('abcde', 'STR')('23456', 'Int')('0abcd', 'STR')('34567', 'Int')('__fred01', 'STR')('45678', 'Int')('123.456', 'Float')('12345a', 'STR')('123.', 'Float')('.456', 'Float')('ab00cd', 'STR')('00ab00', 'STR')

Code demo

Answer 3

>>> test = """\
... 12345    67890     afghe
... abcde    23456     0abcd
... 34567    __fred01  45678
... 123.456  12345a    123.
... .456     ab00cd    00ab00"""
>>> def what_is_it(s):
...     print("'{}'".format(s), end=' ')
...     try:
...         as_float = float(s)
...     except ValueError:
...         return 'matches STRING'
...     else:
...         if as_float.is_integer():
...             return 'matches INT'
...         return 'matches FP'
... 
>>> for line in test.splitlines():
...     for token in line.split():
...         print(what_is_it(token))
...     print()
... 
'12345' matches INT
'67890' matches INT
'afghe' matches STRING

'abcde' matches STRING
'23456' matches INT
'0abcd' matches STRING

'34567' matches INT
'__fred01' matches STRING
'45678' matches INT

'123.456' matches FP
'12345a' matches STRING
'123.' matches INT

'.456' matches FP
'ab00cd' matches STRING
'00ab00' matches STRING

标识整数，字符串和浮点文字

3 个答案: