python正则表达式 - 读取文本文件的一部分

时间:2018-02-07 11:10:31

标签: python regex

我有一个.txt文件,看起来像那样:

Epoch [1]   Iteration [0/51]    Training Loss 1.6095 (1.6095)   Training Accuracy 14.844    Epoch [1]   Iteration [10/51] 

以下代码有什么问题?它返回空列表。

accuracy = list(map(lambda x: x.split('\t')[-1], re.findall(r"\'Training Accuracy\': \d+.\d+", file)))
print(accuracy)
loss = list(map(lambda x: x.split('\t')[-1], re.findall(r"\'Training Loss\': \d.\d+", file)))
print(loss)
epoch = list(map(lambda x: x.split('\t')[-1], re.findall(r"\'Epoch\': \d", file)))
print(epoch) 

谢谢!

2 个答案:

答案 0 :(得分:0)

这个x.split('\t')[-1]只会给出分割字符串的最后一个块,而所需的子字符串位于不同的块上。

使用以下re.search()解决方案:

import re

s = 'Epoch [1]   Iteration [0/51]    Training Loss 1.6095 (1.6095)   Training Accuracy 14.844    Epoch [1]   Iteration [10/51]'
pat = re.compile(r'(Training Loss \d+\.\d+).+(Training Accuracy \d+\.\d+).+(Epoch \[\d+\])')
loss, accuracy, epoch = pat.search(s).groups()

print(loss, accuracy, epoch, sep='\n')

输出(连续):

Training Loss 1.6095
Training Accuracy 14.844
Epoch [1]

答案 1 :(得分:0)

假设您需要提取实体的密钥(名称)和值。 我发布了这个代码,它自动检测并将名称映射到数字

import re
extracted_data = """Epoch [1]   Iteration [0/51]    Training Loss 1.6095 (1.6095)   Training Accuracy 14.844    Epoch [1]   Iteration [10/51]""" #extracted data from the file
splited_data = re.split('([ ]{2,}|\t|\n)', extracted_data) #split the text into chunks with (tabs, newline, spaces more than 2)
re_word = '[a-z A-Z]*' #extractes the word part
re_dig = '[\d.]*' #extract the digit part
#Get key value pairs and make it as dict 
data = {re.findall(re_word, text)[0].strip(): {'full_text': text, 'digit':filter(lambda a: a.strip(), re.findall(re_dig, text)) } for text in splited_data if text.strip()}
print 'Training Accuracy :',data['Training Accuracy']['digit']
print 'Training Loss:',data['Training Loss']['digit']
print 'Epoch:',data['Epoch']['digit']

print data.keys() # this will give you the names extracted.

输出:

Training Accuracy : ['14.844']
Training Loss: ['1.6095', '1.6095']
Epoch: ['1']