如何在Python中对正则表达式应用字符串方法

时间:2016-06-29 09:49:44

标签: python regex markdown

我有一个降价文件,有点破碎:链接和图片太长,其中有换行符。我想从它们中删除换行符。

示例:

从:

See for example the
[installation process for Ubuntu
Trusty](https://wiki.diasporafoundation.org/Installation/Ubuntu/Trusty). The
project offers a Vagrant installation too, but the documentation only admits
that you know what you do, that you are a developer. If it is difficult to

![https://diasporafoundation.org/assets/pages/about/network-
distributed-e941dd3e345d022ceae909beccccbacd.png](data/images/network-
distributed-e941dd3e345d022ceae909beccccbacd.png)

_A pretty decentralized network (Source: <https://diasporafoundation.org/>)_

为:

See for example the
[installation process for Ubuntu Trusty](https://wiki.diasporafoundation.org/Installation/Ubuntu/Trusty). The
project offers a Vagrant installation too, but the documentation only admits
that you know what you do, that you are a developer. If it is difficult to

![https://diasporafoundation.org/assets/pages/about/network-distributed-e941dd3e345d022ceae909beccccbacd.png](data/images/network-distributed-e941dd3e345d022ceae909beccccbacd.png)

_A pretty decentralized network (Source: <https://diasporafoundation.org/>)_

正如您在此代码段中所看到的,我设法使用正确的模式匹配所有链接和图像:https://regex101.com/r/uL8pO4/2

但是现在,Python中使用像string.trim()这样的字符串方法的语法是什么?我用正则表达式捕获了什么?

目前,我坚持这个:

fix_newlines = re.compile(r'\[([\w\s*:/]*)\]\(([^()]+)\)')
# Capture the links and remove line-breaks from their urls
# Something like r'[\1](\2)'.trim() ??
post['content'] = fix_newlines.sub(r'[\1](\2)', post['content'])

编辑:我更新了示例以更明确地解决我的问题。

感谢您的回答

3 个答案:

答案 0 :(得分:0)

strip的工作方式类似于trim的功能。由于您需要修剪新行,请使用strip('\ n'),

fin.readline.strip('\n')

答案 1 :(得分:0)

这也可以:

>>> s = """
...    ![https://diasporafoundation.org/assets/pages/about/network-
... distributed-e941dd3e345d022ceae909beccccbacd.png](data/images/network-
... distributed-e941dd3e345d022ceae909beccccbacd.png)
... """

>>> new_s = "".join(s.strip().split('\n'))
>>> new_s
'![https://diasporafoundation.org/assets/pages/about/network-distributed-e941dd3e345d022ceae909beccccbacd.png](data/images/network-distributed-e941dd3e345d022ceae909beccccbacd.png)'
>>> 

内置字符串函数通常会执行,并且比查找正则表达式更容易阅读。在这种情况下,strip会删除前导和尾随空格,然后split会在换行符之间返回一个项目列表,并且join会将它们放回一个字符串中。

答案 2 :(得分:0)

好吧,我终于找到了我在寻找的东西。使用下面的代码片段,我可以使用正则表达式捕获字符串,然后对每个字符串应用处理。

def remove_newlines(match):
    return "".join(match.group().strip().split('\n'))

links_pattern = re.compile(r'\[([\w\s*:/\-\.]*)\]\(([^()]+)\)')
post['content'] = links_pattern.sub(remove_newlines, post['content'])

感谢您的回答,对不起,如果我的问题不够明确。