Question

我有一个文字，我想删除它的网址，但我有问题。

document = re.sub('[^a-z]|http:\/\/\w+.\w+\/\w*', ' ', document)

Igot： document ='rt @prettycolleges：凤凰城大学http://t.co/d5wxsy332r good'

>> 'rt  prettycolleges  university of phoenix http     t co  d wxsy   r good'

但我想要这个结果：rt prettycolleges university of phoenix good

我该怎么办？

Answer 1

您可以使用像

这样的正则表达式

'\ s * http：//.*？\ s'（查找包含网址的字符串 - http：// - 并以空格结尾）

因为子函数取代了你想要的东西，代码应该是：

import re
document = 'rt @prettycolleges: university of phoenix http://t.co/d5wxsy332r good'

print re.sub(r'http:\\*/\\*/.*?\s', ' ', document) ## note the r (raw string)
>> 'rt @prettycolleges: university of phoenix good'

如何从Python中的字符串中删除一些URL

1 个答案: