我想在python 3中使用正则表达式获取日期和文本中的特定项。
text = '''
190219 7:05:30 line1 fail
line1 this is the 1st fail
line2 fail
line2 this is the 2nd fail
line3 success
line3 this is the 1st success process
line3 this process need 3sec
200219 9:10:10 line1 fail
line1 this is the 1st fail
line2 success
line2 this is the 1st success process
line2 this process need 4sec
line3 success
line3 this is the 2st success process
line3 this process need 2sec
'''
在上面的示例中,我想获取“成功行”之后的所有行。这里是期望的输出:
[('190219','7:05:30','line3 this is the 1st success process', 'line3 this process need 3sec'),
('200219', '9:10:10', 'line2 this is the 1st success process', 'line2 this process need 4sec', 'line3 this is the 2st success process','line3 this process need 2sec')]
这是我想尝试的:
>>> newLine = re.sub(r'\t|\n|\r|\s{2,}',' ', text)
>>> newLine
>>> Out[3]: ' 190219 7:05:30 line1 fail line1 this is the 1st fail line2 fail line2 this is the 2nd fail line3 success line3 this is the 1st success process line3 this process need 3sec 200219 9:10:10 line1 fail line1 this is the 1st fail line2 success line2 this is the 1st success process line2 this process need 4sec line3 success line3 this is the 2st success process line3 this process need 2sec '
我不知道获得结果的正确方法是什么。我已经试过了:
(\b\d{6}\b \d{1,}:\d{2}:\d{2})...
我该如何解决这个问题?
答案 0 :(得分:1)
这是一个使用正则表达式获取日期,使用常规Python获取其他所有内容的解决方案。
准备输入:
text = '''
190219 7:05:30 line1 fail
line1 this is the 1st fail
line2 fail
line2 this is the 2nd fail
line3 success
line3 this is the 1st success process
line3 this process need 3sec
200219 9:10:10 line1 fail
line1 this is the 1st fail
line2 success
line2 this is the 1st success process
line2 this process need 4sec
line3 success
line3 this is the 2st success process
line3 this process need 2sec
'''
# Strip the multiline string, split into lines, then strip each line
lines = [line.strip() for line in text.strip().splitlines()]
result = parse(lines)
解决方案:
import re
def parse(lines):
result = []
buffer = []
success = False
for line in lines:
date = re.match(r"(\d{6})\s(\d{1,}:\d{2}:\d{2})", line)
if date:
# Store previous match and reset buffer
if buffer:
result.append(tuple(buffer))
buffer.clear()
# Split the date and time and add to buffer
buffer.extend(date.groups())
# Check for status change
if line.endswith("success") or line.endswith("fail"):
success = True if line.endswith("success") else False
# Add current line to buffer if it's part of the succeeded process
else:
if success:
buffer.append(line)
# Store last match
result.append(tuple(buffer))
return result
输出:
result = [('190219', '7:05:30', 'line3 this is the 1st success process', 'line3 this process need 3sec'), ('200219', '9:10:10', 'line2 this is the 1st success process', 'line2 this process need 4sec', 'line3 this is the 2st success process', 'line3 this process need 2sec')]
答案 1 :(得分:1)
使用itertools中的groupby,这是类似的解决方案:
ExifOrientations exifOrientation = ImageExifOrientation(image);
switch (exifOrientation)
{
case ExifOrientations.TopLeft:
// DO NOT OPERATE ON IT
break;
case ExifOrientations.TopRight:
image.RotateFlip(RotateFlipType.RotateNoneFlipX);
break;
case ExifOrientations.BottomRight:
image.RotateFlip(RotateFlipType.Rotate180FlipNone);
break;
case ExifOrientations.BottomLeft:
image.RotateFlip(RotateFlipType.RotateNoneFlipY);
break;
case ExifOrientations.LeftTop:
image.RotateFlip(RotateFlipType.Rotate90FlipY);
break;
case ExifOrientations.RightTop:
image.RotateFlip(RotateFlipType.Rotate90FlipNone);
break;
case ExifOrientations.RightBottom:
image.RotateFlip(RotateFlipType.Rotate90FlipX);
break;
case ExifOrientations.LeftBottom:
image.RotateFlip(RotateFlipType.Rotate270FlipNone);
break;
}
答案 2 :(得分:1)
如果您喜欢功能更强大的代码,那么下面的代码应该可以使用。我在python中使用了一个名为toolz的函数库。您可以执行pip install toolz
进行安装。以下代码不使用任何正则表达式,而仅使用partitions
和filters
。请使用包含文本的文件更改input_file
,然后尝试。
from toolz import partitionby, partition
from itertools import dropwhile
input_file = r'input_file.txt'
def line_starts_empty(line):
return line.startswith(' ')
def clean(line):
return line.strip()
def contains_no_success(line):
return 'success' not in line.lower()
def parse(args):
head_line, tail_lines = args
result_head = head_line[0].split()[:2]
result_tail = list(map(clean, dropwhile(contains_no_success, tail_lines)))
return result_head + result_tail
for item in map(parse, partition(2, partitionby(line_starts_empty, open(input_file)))):
print(item)
答案 3 :(得分:0)
这是我使用正则表达式的解决方案:
text = '''
190219 7:05:30 line1 fail
line1 this is the 1st fail
line2 fail
line2 this is the 2nd fail
line3 success
line3 this is the 1st success process
line3 this process need 3sec
200219 9:10:10 line1 fail
line1 this is the 1st fail
line2 success
line2 this is the 1st success process
line2 this process need 4sec
line3 success
line3 this is the 2st success process
line3 this process need 2sec
'''
# find desired lines
count = 0
data = []
for item in text.splitlines():
# find date
match_date = re.search('\d+\s\d+:\d\d:\d\d', item)
# get date
if match_date != None:
count = 1
date_time = match_date.group().split(' ')
for item in date_time:
data.append(item)
# find line with success
match = re.search('\w+\d\ssuccess',item)
# handle collecting next lines
if match != None:
count = 2
if count > 2:
data.append(item.strip())
if count == 2:
count += 1
# split list data
# find integers i list
numbers = []
for item in data:
numbers.append(item.isdigit())
# get positions of integers
indexes = [i for i,x in enumerate(numbers) if x == True]
number_of_elements = len(data)
indexes = indexes + [number_of_elements]
# create list of list
result = []
for i in range(0, len(indexes)-1):
result.append(data[indexes[i]:indexes[i+1]])
结果:
[['190219', '7:05:30', 'line3 this is the 1st success process', 'line3 this process need 3sec'], ['200219', '9:10:10', 'line2 this is the 1st success process', 'line2 this process need 4sec', 'line3 this is the 2st success process', 'line3 this process need 2sec']]