Question

我试图摆脱网站回复http://app.lotto.pl/wyniki/?type=dl的数字，代码低于

import requests
import re

url = 'http://app.lotto.pl/wyniki/?type=dl'
p = re.compile(r'[^\d{4}\-\d{2}\-\d{2}]\d+')

response = requests.get(url)
data = re.findall(p, response.text)
print(data)

但不是['7', '46', '8', '43', '9', '47']而是['\n7', '\n46', '\n8', '\n43', '\n9', '\n47']我如何摆脱"\n"？

Answer 1

您的正则表达式不合适，因为[^\d{4}\-\d{2}\-\d{2}]\d+匹配任何字符，只有数字，{，4，}，-，2然后是1位或更多位数。换句话说，您将序列转换为字符集。而否定字符类可以匹配换行符。它也可以匹配任何字母。还有更多。 strip在其他情况下无效，您需要修复正则表达式。

使用

r'(?<!-)\b\d+\b(?!-)'

请参阅regex和IDEONE demo

此模式将匹配1 +位（\d+），前面没有连字符（(?<!-)）或任何单词字符（\b），并且后面没有单词字符（\b）或连字符（-）。

您的代码如下：

import requests
import re

url = 'http://app.lotto.pl/wyniki/?type=dl'
p = re.compile(r'(?<!-)\b\d+\b(?!-)')

response = requests.get(url)
data = p.findall(response.text)
print(data)

Answer 2

您可以使用\n功能

剥离strip()

data = [x.strip() for x in re.findall(p, response.text)]

我假设\n可以在开头也可以在最后

Answer 3

由于您的数字是字符串，因此您可以轻松地将configuration.load.destinationTable.projectId方法用于字符串。这种方法确实会删除字符串左侧的换行符/回车字符（这就是为什么 l strip）。
你可以试试像

这样的东西

lstrip()

删除换行符。

或者你也可以用自己的剥离版本覆盖print([item.lstrip() for item in data])：

data

然后只需data=[item.lstrip() for item in data]。

如何从requests.get（）。text中排除换行标记

3 个答案: