Question

我正在尝试从长文件中删除网址。我的代码工作正常，除了这个实例（如下）。我相信问题是网址字符串中有一个？。如何在我的循环体内处理这种情况？如何强制re.sub（）忽略？在url变量中？

blah = 'City of San Jose. Playa to Paseo, http://www.sanjoseca.gov/index.aspx?nid=5876'
url='http://www.sanjoseca.gov/index.aspx?nid=5876'
re.sub(url,'',blah)

OUT>>'City of San Jose. Playa to Paseo, http://www.sanjoseca.gov/index.aspx?nid=5876'

Desired OUT>>> 'City of San Jose. Playa to Paseo, '

编辑：使用一个奇怪的字符手动修复整个文件中的每个url 不是我想做的我在这里用网址循环了1000多个行。

Answer 1

您需要适当地转义正则表达式中的所有特殊字符以匹配文字字符。这也包括句点：

blah = 'City of San Jose. Playa to Paseo, http://www.sanjoseca.gov/index.aspx?nid=5876'
url='http://www\.sanjoseca\.gov/index\.aspx\?nid=5876'
print(re.sub(url,'',blah))

或者，您可以使用re.escape为您完成此操作：

blah = 'City of San Jose. Playa to Paseo, http://www.sanjoseca.gov/index.aspx?nid=5876'
url = re.escape('http://www.sanjoseca.gov/index.aspx?nid=5876')
print(re.sub(url,'',blah))

如何获得re.sub（）忽略模式中的问号？

1 个答案: