Question

我想将.txt文件的内容作为字符串加载并提取特定信息。信息在其出现之前和之后有大量文本，如下所示：

ValueName:     1234

但也可能看起来像：

ValueName:     123456

也就是说，值总是一串整数，但长度不一。

我想在字符串中找到'ValueName'，然后返回以6个字符开头的字符。我的想法是检查并查看'ValueName'后面的6个字符后面的10个字符是否为整数，如果是，则按顺序返回。这可能吗？谢天谢地。

Answer 1

您可以使用正则表达式提取ValueName:

之后的值

>>> import re
>>> line = 'some dummy text ValueName:     123456 some dummy text'
>>> m = re.findall(r'ValueName:\s+([0-9]+)',line)
>>> m
['123456']

如果它们存在，这将找到多个匹配。

>>> import re
>>> line = 'blah blah ValueName: 1234 blah blah ValueName: 5678'
>>> m = re.findall(r'ValueName:\s+([0-9]+)',line)
>>> m
['1234', '5678']

Answer 2

正则表达式会使这更简单，正如Brian的答案（以及其他）所示。

但如果你不愿意理解它的作用，请不要使用正则表达式。如果你现在想要推迟学习曲线，*这对简单的字符串处理来说并不难：

def numeric_value_names(path):
    with open(path) as f:
        for line in f:
            bits = line.partition('ValueName:')
            if bits[1] and not bits[0]:
                rest = bits[2][6:].rstrip()
                if rest.isdigit():
                    yield rest

使用str.partition这种方式对新手来说可能有点迟钝，所以你可能想让条件更明显：

def numeric_value_names(path):
    with open(path) as f:
        for line in f:
            if line.startswith('ValueName:'):
                bits = line.partition('ValueName:')
                rest = bits[2][6:].rstrip()
                if rest.isdigit():
                    yield rest

*你肯定想要在某些时候学习简单的正则表达式;唯一的问题是你现在是否还有更紧迫的事情......

Answer 3

import re

regex = re.compile(r'ValueName:\s*([0-9]+)')
with open(file, "r") as f:
    for line in f:
        match = re.search(regex, line)
        if match:
            result = int(match.group(1))
            break

Answer 4

使用正则表达式

import re
for line in text
  re.search('^ValueName: (\d+)',line).group(1)

如果您需要对它们进行排序，那么您应该将它们放在列表中。

lst.append(re.search('^ValueName: (\d+)',line).group(1))

最后只是对列表进行排序

排序（LST）

接下来，我将向您展示一个完整的示例，以便您可以提取所需内容

import re

text = ['ValueName: 33413','ValueName: 443234531','ValueName: 5243222','ValueName: 33']
lst = []

for line in text:
  lst.append(re.search('^ValueName: (\d+)',line).group(1))

lst = [int(x) for x in lst]
for x in sorted(lst):
  print(x)

Answer 5

你可以这样做：

for line in open("file"):
    if "1234" in line:
    print line

来源： http://ubuntuforums.org/showthread.php?t=820319

Answer 6

使用正则表达式可以执行类似

的操作

regex = re.compile("^(.*[0-9]{4,}.*)$")
for line in regex.findall(your_text_here):
    print line

鉴于正则表达式

 ^(.*[0-9]{4,}.*)$

将匹配中间某处至少有4个整数的所有行。

Answer 7

你可以这样做

import re

re.findall(r'ValueName:\d\d\d',s)

如果's'是您的字符串变量（名称），\ d是您要查找的整数数。在你的情况下，它将是\ d \ d \ d \ d \ d \ ...不完全漂亮，但它的工作原理。

Python从文本文件中提取不同长度的值

7 个答案: