我有一个降价文件,有点破碎:链接和图片太长,其中有换行符。我想从它们中删除换行符。
示例:
从:
See for example the
[installation process for Ubuntu
Trusty](https://wiki.diasporafoundation.org/Installation/Ubuntu/Trusty). The
project offers a Vagrant installation too, but the documentation only admits
that you know what you do, that you are a developer. If it is difficult to
![https://diasporafoundation.org/assets/pages/about/network-
distributed-e941dd3e345d022ceae909beccccbacd.png](data/images/network-
distributed-e941dd3e345d022ceae909beccccbacd.png)
_A pretty decentralized network (Source: <https://diasporafoundation.org/>)_
为:
See for example the
[installation process for Ubuntu Trusty](https://wiki.diasporafoundation.org/Installation/Ubuntu/Trusty). The
project offers a Vagrant installation too, but the documentation only admits
that you know what you do, that you are a developer. If it is difficult to
![https://diasporafoundation.org/assets/pages/about/network-distributed-e941dd3e345d022ceae909beccccbacd.png](data/images/network-distributed-e941dd3e345d022ceae909beccccbacd.png)
_A pretty decentralized network (Source: <https://diasporafoundation.org/>)_
正如您在此代码段中所看到的,我设法使用正确的模式匹配所有链接和图像:https://regex101.com/r/uL8pO4/2
但是现在,Python中使用像string.trim()
这样的字符串方法的语法是什么?我用正则表达式捕获了什么?
目前,我坚持这个:
fix_newlines = re.compile(r'\[([\w\s*:/]*)\]\(([^()]+)\)')
# Capture the links and remove line-breaks from their urls
# Something like r'[\1](\2)'.trim() ??
post['content'] = fix_newlines.sub(r'[\1](\2)', post['content'])
编辑:我更新了示例以更明确地解决我的问题。
感谢您的回答
答案 0 :(得分:0)
strip的工作方式类似于trim的功能。由于您需要修剪新行,请使用strip('\ n'),
fin.readline.strip('\n')
答案 1 :(得分:0)
这也可以:
>>> s = """
... ![https://diasporafoundation.org/assets/pages/about/network-
... distributed-e941dd3e345d022ceae909beccccbacd.png](data/images/network-
... distributed-e941dd3e345d022ceae909beccccbacd.png)
... """
>>> new_s = "".join(s.strip().split('\n'))
>>> new_s
'![https://diasporafoundation.org/assets/pages/about/network-distributed-e941dd3e345d022ceae909beccccbacd.png](data/images/network-distributed-e941dd3e345d022ceae909beccccbacd.png)'
>>>
内置字符串函数通常会执行,并且比查找正则表达式更容易阅读。在这种情况下,strip会删除前导和尾随空格,然后split会在换行符之间返回一个项目列表,并且join会将它们放回一个字符串中。
答案 2 :(得分:0)
def remove_newlines(match):
return "".join(match.group().strip().split('\n'))
links_pattern = re.compile(r'\[([\w\s*:/\-\.]*)\]\(([^()]+)\)')
post['content'] = links_pattern.sub(remove_newlines, post['content'])
感谢您的回答,对不起,如果我的问题不够明确。