Question

所以，这是代码：

#!/usr/bin/python
from sys import exit
import urllib.request

answer = urllib.request.urlopen("http://monip.org").read()

def debug(txt):
    print(txt)
    exit(0)

def parse_answer(answer):
    ''' Simple function to parse request's HTML result
        to find the ip in it. Raise RuntimeError if no 
        ip in result and ip else.
    '''
    import re
    pattern = "^\w+(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\w+$"
    regexp = re.compile(pattern)
    if regexp.match(regexp, answer):
        m = regexp.search(regexp, answer)
        ip = m.group(0)
        return ip
    else:
        raise RuntimeError

try:
    ip = parse_answer(answer)
except RuntimeError:
    print("Error, check your network configuration.")
    print("Aborting..")
    exit(1)

print("IP:", ip)

我写了那个。此代码旨在为您提供公共IP地址。如果它无法给你任何东西，它会抛出一个RunTime错误。

这是错误：

追踪（最近一次通话）：文件“./ippub”，第27行，在 ip = parse_answer（回答）在parse_answer中输入“./ippub”，第19行 if regexp.match（regexp，answer）： TypeError：'bytes'对象不能解释为整数

这意味着“回答”变量是字节，但我想匹配一个ip地址，我不能因为python类型系统： - ）

有什么想法吗？非常感谢！

Answer 1

你有两个不同的问题。

您需要将answer转换为字符串，即使answer有一些无法用utf-8解码的有趣字符。
您正在调用正则表达式API。

以下是更正后的版本，使用chr解决问题1，并使用正确的语法修复问题2.

#!/usr/bin/python
from sys import exit
import urllib.request
import re


def debug(txt):
    print(txt)
    exit(0)

def parse_answer(answer):
    ''' Simple function to parse request's HTML result
        to find the ip in it. Raise RuntimeError if no 
        ip in result and ip else.
    '''
    answer = "".join([chr(x) for x in answer])
    pattern = "(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})"
    regexp = re.compile(pattern)
    m = regexp.search(answer)
    if m:
        ip = m.group(0)
        return ip
    else:
        raise RuntimeError

answer = urllib.request.urlopen("http://monip.org").read()

try:
    ip = parse_answer(answer)
except RuntimeError:
    print("Error, check your network configuration.")
    print("Aborting..")
    exit(1)

print("IP:", ip)

Answer 2

如果您尝试：

print answer

你会失败，因为它是在ISO-8859-1中编码的。

在将其发送到UTF-8之前，您应先将其转换为parse_answer()：

answer = answer.encode('utf8')

一旦你通过了这个障碍，你将遇到另一个错误，它依赖于以下两行：

if regexp.match(regexp, answer):
    m = regexp.search(regexp, answer)

因为regex已经是编译模式，所以不应该在上面两个调用中的任何一个中将它作为参数发送！将代码更改为：

if regexp.match(answer):
    m = regexp.search(answer)

它应该有效！

对于Merlin：

import requests
answer = requests.get("http://monip.org")
print answer.text.encode('utf8')

<强>输出

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<title>MonIP.org v1.0</title>
<META http-equiv="Content-type" content="text/html; charset=ISO-8859-1">
</head>
<P ALIGN="center"><FONT size=8><BR>IP : 50.184.3.115<br></font><font size=3><i>c-50-184-3-115.hsd1.ca.comcast.net</i><br></font><font size=1><br><br>Pas de proxy détecté - No Proxy detected</font></html>

正则表达式中的字节错误

2 个答案: