裁剪出一部分字符串并使用正则表达式进行打印

时间:2017-11-29 21:21:06

标签: python regex string

我正在尝试裁剪字符串列表的一部分并打印它们。数据如下所示 -

Books are on the table\nPick them up
Pens are in the bag\nBring them
Cats are roaming around
Dogs are sitting
Pencils, erasers, ruler cannot be found\nSearch them
Laptops, headphones are lost\nSearch for them

(这只是文件中100行数据的几行)

我必须在第1,2,5,6行的\ n之前裁剪字符串并打印它们。我还要打印3,4行。预期产出 -

Books are on the table
Pens are in the bag
Cats are roaming around
Dogs are sitting
Pencils erasers ruler cannot be found
Laptops headphones are lost

到目前为止我尝试了什么 -

首先,我将comma替换为space - a = name.replace(',',' ');

然后我使用正则表达式来裁剪子字符串。我的正则表达式是 - b = r'.*-\s([\w\s]+)\\n'。我无法打印其中\n不存在的第3行和第4行。

我现在收到的输出是 -

Books are on the table
Pens are in the bag
Pencils erasers ruler cannot be found
Laptops headphones are lost

我应该在表达式中添加什么来打印第3行和第4行?

TIA

2 个答案:

答案 0 :(得分:1)

您可以匹配并删除使用反斜杠和n或所有标点符号(非单词和非空白)字符组合的行部分{{3 }}:

a = re.sub(r'\\n.*|[^\w\s]+', '', a)

请参阅re.sub

<强>详情

  • \\n.* - \n,然后是行的其余部分
  • | - 或
  • [^\w\s]+ - 除了单词和空格之外的一个或多个字符

如果您需要确保\n后面有一个大写字母,您可以在模式中[A-Z]之后添加n

答案 1 :(得分:0)

我知道很多人喜欢用正则表达式将他们的思想扭曲成结,但为什么不呢,

with open('geek_lines.txt') as lines:
    for line in lines:
        print (line.rstrip().split(r'\n')[0])

易于编写,易于阅读,似乎可以产生正确的结果。

Books are on the table
Pens are in the bag
Cats are roaming around
Dogs are sitting
Pencils, erasers, ruler cannot be found
Laptops, headphones are lost