Question

我有一个降价文件，有点破碎：链接和图片太长，其中有换行符。我想从它们中删除换行符。

示例：

从：

See for example the
[installation process for Ubuntu
Trusty](https://wiki.diasporafoundation.org/Installation/Ubuntu/Trusty). The
project offers a Vagrant installation too, but the documentation only admits
that you know what you do, that you are a developer. If it is difficult to

![https://diasporafoundation.org/assets/pages/about/network-
distributed-e941dd3e345d022ceae909beccccbacd.png](data/images/network-
distributed-e941dd3e345d022ceae909beccccbacd.png)

_A pretty decentralized network (Source: <https://diasporafoundation.org/>)_

为：

See for example the
[installation process for Ubuntu Trusty](https://wiki.diasporafoundation.org/Installation/Ubuntu/Trusty). The
project offers a Vagrant installation too, but the documentation only admits
that you know what you do, that you are a developer. If it is difficult to

![https://diasporafoundation.org/assets/pages/about/network-distributed-e941dd3e345d022ceae909beccccbacd.png](data/images/network-distributed-e941dd3e345d022ceae909beccccbacd.png)

_A pretty decentralized network (Source: <https://diasporafoundation.org/>)_

正如您在此代码段中所看到的，我设法使用正确的模式匹配所有链接和图像：https://regex101.com/r/uL8pO4/2

但是现在，Python中使用像string.trim()这样的字符串方法的语法是什么？我用正则表达式捕获了什么？

目前，我坚持这个：

fix_newlines = re.compile(r'\[([\w\s*:/]*)\]\(([^()]+)\)')
# Capture the links and remove line-breaks from their urls
# Something like r'[\1](\2)'.trim() ??
post['content'] = fix_newlines.sub(r'[\1](\2)', post['content'])

编辑：我更新了示例以更明确地解决我的问题。

感谢您的回答

Answer 1

strip的工作方式类似于trim的功能。由于您需要修剪新行，请使用strip（'\ n'），

fin.readline.strip('\n')

Answer 2

这也可以：

>>> s = """
...    ![https://diasporafoundation.org/assets/pages/about/network-
... distributed-e941dd3e345d022ceae909beccccbacd.png](data/images/network-
... distributed-e941dd3e345d022ceae909beccccbacd.png)
... """

>>> new_s = "".join(s.strip().split('\n'))
>>> new_s
'![https://diasporafoundation.org/assets/pages/about/network-distributed-e941dd3e345d022ceae909beccccbacd.png](data/images/network-distributed-e941dd3e345d022ceae909beccccbacd.png)'
>>>

内置字符串函数通常会执行，并且比查找正则表达式更容易阅读。在这种情况下，strip会删除前导和尾随空格，然后split会在换行符之间返回一个项目列表，并且join会将它们放回一个字符串中。

Answer 3

好吧，我终于找到了我在寻找的东西。使用下面的代码片段，我可以使用正则表达式捕获字符串，然后对每个字符串应用处理。

def remove_newlines(match):
    return "".join(match.group().strip().split('\n'))

links_pattern = re.compile(r'\[([\w\s*:/\-\.]*)\]\(([^()]+)\)')
post['content'] = links_pattern.sub(remove_newlines, post['content'])

感谢您的回答，对不起，如果我的问题不够明确。

如何在Python中对正则表达式应用字符串方法

3 个答案: