Question

我试图使用pythons正则表达式来拉取数字值（100.00＆amp; 200.00），但是当我调用代码时它不会产生任何东西...... 我使用的是python版本2.7

1）我的文件名是“file100”，我需要选择值..

# cat file100
Hi this doller 100.00
Hi this is doller 200.00

2）这是我的python代码..

# cat count100.py
#!/usr/bin/python
import re
file = open('file100', 'r')
for digit in file.readlines():
        myre=re.match('\s\d*\.\d{2}', digit)
        if myre:
           print myre.group(1)

3）当我运行此代码时，它不会产生任何结果，没有错误......没有...

# python   count100.py

Answer 1

改为使用re.search：

import re
file = open('file.txt', 'r')
for digit in file.readlines():
    myre = re.search(r'\s\b(\d*\.\d{2})\b', digit)
    if myre:
        print myre.group(1)

<强>结果

100.00
200.00

来自文档：

扫描字符串，查找常规的第一个位置表达式模式产生匹配

如果您决定使用群组，则还需要括号：

（...）匹配括号内的正则表达式，并指示组的开始和结束;一组的内容可以在执行匹配后检索，并且可以匹配稍后在带有\ number特殊序列的字符串中进行描述下面。要匹配文字'（'或'）'，请使用（或）或将它们括起来在一个字符类中：[（] [）]。

re.match仅有效：

字符串的开头处的零个或多个字符匹配正则表达式

r将regex括起来raw strings：

字符串文字可以选择以字母“r”或“R”为前缀; 这些字符串称为原始字符串，并使用不同的规则解释反斜杠转义序列。

...

除非存在'r'或'R'前缀，否则转义字符串中的序列根据类似于标准C
使用的规则解释

Answer 2

如果它们始终位于行的末尾，只需rsplit一次并拉出最后一个元素：

with open('file100', 'r') as f:
    for line in f:
        print(line.rsplit(None, 1)[1])

输出：

100.00
200.00

rsplit(None,1)只是意味着我们从空格上的字符串末尾拆分一次，然后我们拉出第二个元素：

In [1]: s = "Hi this doller 100.00"

In [2]: s.rsplit(None,1)
Out[2]: ['Hi this doller', '100.00']

In [3]: s.rsplit(None,1)[1]
Out[3]: '100.00'

In [4]: s.rsplit(None,1)[0]
Out[4]: 'Hi this doller'

如果您确实需要正则表达式使用search：

import re

with open('file100', 'r') as f:
    for line in f:
        m = re.search(r"\b\d+\.\d{2}\b",line)
        if m:
            print(m.group())

Answer 3

您的主要问题是您正在使用re.match，这需要从字符串开头开始匹配，而不是re.search，这允许匹配可以从字符串中的任何位置开始。不过，我会打破我的建议：

import re

无需在每个循环上重新编译（Python实际上为您缓存了一些正则表达式，但在引用中保留一个是安全的）。我正在使用VERBOSE标志来为您分解正则表达式。在字符串之前使用r，以便在Python读取字符串时，反斜杠不会转义它们之前的字符：

regex = re.compile(r'''
  \s      # one whitespace character, though I think this is perhaps unnecessary
  \d*     # 0 or more digits
  \.      # a dot
  \d{2}   # 2 digits
  ''', re.VERBOSE)

使用上下文管理器并使用通用换行符'rU'模式打开文件，这样无论在哪个平台上创建文件，您都可以逐行阅读。

with open('file100', 'rU') as file:

不要使用readlines，它会立即将整个文件加载到内存中。而是使用文件对象作为迭代器：

    for line in file:
        myre = regex.search(line) 
        if myre:
            print(myre.group(0)) # access the first group, there are no  
                                 # capture groups in your regex

我的代码打印：

100.00
200.00

Answer 4

这里有几个问题：

.match仅查找字符串开头的匹配项 - 请参阅search() vs. match()。
您没有使用捕获组，因此.group(1) myre.group(1) {i}}没有理由拥有任何内容

以下是更新后的示例：

import re

file = """
Hi this doller 100.00
Hi this is doller 200.00
"""

for digit in file.splitlines():
    myre = re.search('\s\d*\.\d{2}', digit)
    if myre:
        print(myre.group(0))

Python RE用于搜索数字和小数

4 个答案: