Question

我有一个类似的文件：

RANDOMTEXTSAMPLE*
$SAMPLERANDOMTEXT
RANDOMSAMPLE*TEXT

我试图提取并列出所有＆＃34;样本＆＃34;的实例。最后有*。

我试过这样的事情：

import re

with open('file1.txt') as myfile:
content = myfile.read()

text = re.search(r'[0-9A-Z]{7}\*', content)
with open("file2.txt", "w") as myfile2:
myfile2.write(text)

但是我只会得到它发现的第一个结果。

如何在列表中以*结尾的所有样本实例，而不将*添加到列表中，我们将不胜感激。

由于

编辑：小修正

Answer 1

你可以试试这个：

import re

samples = []

with open('file1.txt') as myfile:
    for line in myfile.readlines():
        if re.search(r'[0-9A-Z]{6}\*', line):                
            samples.append(line)

# print('SAMPLES: ', samples)

with open("file2.txt", "w") as myfile2:
    for s in samples:
        myfile2.write(s)

Answer 2

从你的问题来看，目前尚不清楚你是想在最后匹配美元符号，还是在最后匹配星号，无论如何你可以使用后引用反向引用来解决问题。如果您不知道它们是什么，您可以阅读有关反向引用的更多信息here。

import re
with open ("file1.txt", "r") as myfile:

    samples = []
    pattern = re.compile(r'([a-zA-Z]+)\*') 
    for line in myfile.readlines():

        for matched_object in pattern.finditer(line):
           samples.append(matched_object.group(1))

这将为您提供样本列表。你可以看到正则表达式演示here。

注意：由于不清楚你想要匹配什么，你可能需要在我的正则表达式中修改后退引用以匹配你的具体输入。无论如何，这段代码snipet应该让你全面了解如何解决这个问题。

蟒蛇。从文件中提取字符串

2 个答案: