检查Python中的空行

时间:2018-01-11 07:15:00

标签: python regex text

我有一个包含多行的文本文件。我想在两行(++起始行和 - 退出行)之间检查特定行(调用xyz ...)。如果(调用xyz ...)行存在,那么它应该返回该行,如果不存在行,则应该返回NULL值。我想将结果存储到列表中。

示例文件:

++ start line 
22 15:36:53 
dog, cat, monkey, rat
calling xxxxx
animal already added
-- exiting line

上面的行块应添加调用xxxxx 列表。

++ start line 
12 12:56:34 
cat, camel, cow, dog    
animal already added
-- exiting line

在上面的块调用中,xyz缺失,因此它应该将NULL添加到列表

预期产出

calling xxxxx
NULL

3 个答案:

答案 0 :(得分:0)

您可以使用此正则表达式检查您提到的情况:

^\+\+(?=(?:(?!\-\-).)*\s+(calling[^\n]+)).*?\s+--

Observe how the regex works here

如果匹配,则将主叫行作为组1

示例来源(run here):

import re

regex = r"(?:^\+\+(?=(?:(?!\-\-).)*\s+(calling[^\n]+)).*?\s+--)|(?:^\+\+(?=(?:(?!\-\-).)*\s+(?!calling[^\n]+)).*?\s+--)"

test_str = ("++ start line \n"
    "22 15:36:53 \n"
    "dog, cat, monkey, rat\n"
    "calling xxxxx\n"
    "animal already added\n"
    "-- exiting line\n\n\n"
    "++ start line \n"
    "12 12:56:34 \n"
    "cat, camel, cow, dog    \n"
    "animal already added\n"
    "-- exiting line\n\n"
    "++ start line \n"
    "12 12:56:34 \n"
    "cat, camel, cow, dog  \n"
    "calling pqr  \n"
    "animal already added\n"
    "-- exiting line\n\n")

matches = re.finditer(regex, test_str, re.DOTALL | re.MULTILINE)

for match in matches:
    print(match.group(1))

输出:

calling xxxxx
None
calling pqr  

答案 1 :(得分:0)

您可能希望使用多个模式,一个用于分隔块,另一个用于块中的搜索calling...

  1. 块的表达式(参见a demo here):

    ^\+\+ (?P<block>[\s\S]+?) ^--.+

  2. calling...的表达式:

    ^calling.+

  3. <小时/> 作为Python摘要:

    import re
    rx_block = re.compile(r'''
        ^\+\+
        (?P<block>[\s\S]+?)
        ^--.+''', re.MULTILINE | re.VERBOSE)
    
    rx_calling = re.compile(r'''
        ^calling.+
        ''', re.MULTILINE | re.VERBOSE)
    
    numbers = [number.group(0) if number else None
                for block in rx_block.finditer(your_string_here)
                for number in [rx_calling.search(block.group('block'))]]
    print(numbers)
    

    哪个收益

    ['calling xxxxx', None]
    

答案 2 :(得分:0)

可以使用拆分功能获取子部件并检查它们:

outlist = []
with open("calling.txt", "r") as ff:
    lines = ff.read()
    records = lines.split("++ start line ")
    records = list(filter(lambda x: len(x)>0, records))
    for rec in records:
        found = False
        rows = rec.split("\n")
        for row in rows:
            if not found and row.startswith("calling"):
                outlist.append(row.split(" ")[1])
                found = True
        if not found: 
            outlist.append("NULL")

print(outlist)

输出:

['xxxxx', 'NULL', 'pqr']