我有一个.txt文件,看起来像那样:
Epoch [1] Iteration [0/51] Training Loss 1.6095 (1.6095) Training Accuracy 14.844 Epoch [1] Iteration [10/51]
以下代码有什么问题?它返回空列表。
accuracy = list(map(lambda x: x.split('\t')[-1], re.findall(r"\'Training Accuracy\': \d+.\d+", file)))
print(accuracy)
loss = list(map(lambda x: x.split('\t')[-1], re.findall(r"\'Training Loss\': \d.\d+", file)))
print(loss)
epoch = list(map(lambda x: x.split('\t')[-1], re.findall(r"\'Epoch\': \d", file)))
print(epoch)
谢谢!
答案 0 :(得分:0)
这个x.split('\t')[-1]
只会给出分割字符串的最后一个块,而所需的子字符串位于不同的块上。
使用以下re.search()
解决方案:
import re
s = 'Epoch [1] Iteration [0/51] Training Loss 1.6095 (1.6095) Training Accuracy 14.844 Epoch [1] Iteration [10/51]'
pat = re.compile(r'(Training Loss \d+\.\d+).+(Training Accuracy \d+\.\d+).+(Epoch \[\d+\])')
loss, accuracy, epoch = pat.search(s).groups()
print(loss, accuracy, epoch, sep='\n')
输出(连续):
Training Loss 1.6095
Training Accuracy 14.844
Epoch [1]
答案 1 :(得分:0)
假设您需要提取实体的密钥(名称)和值。 我发布了这个代码,它自动检测并将名称映射到数字
import re
extracted_data = """Epoch [1] Iteration [0/51] Training Loss 1.6095 (1.6095) Training Accuracy 14.844 Epoch [1] Iteration [10/51]""" #extracted data from the file
splited_data = re.split('([ ]{2,}|\t|\n)', extracted_data) #split the text into chunks with (tabs, newline, spaces more than 2)
re_word = '[a-z A-Z]*' #extractes the word part
re_dig = '[\d.]*' #extract the digit part
#Get key value pairs and make it as dict
data = {re.findall(re_word, text)[0].strip(): {'full_text': text, 'digit':filter(lambda a: a.strip(), re.findall(re_dig, text)) } for text in splited_data if text.strip()}
print 'Training Accuracy :',data['Training Accuracy']['digit']
print 'Training Loss:',data['Training Loss']['digit']
print 'Epoch:',data['Epoch']['digit']
print data.keys() # this will give you the names extracted.
输出:
Training Accuracy : ['14.844']
Training Loss: ['1.6095', '1.6095']
Epoch: ['1']