在Python中定义单词

时间:2015-07-29 07:38:41

标签: python html function python-3.x dictionary

这似乎与此重复:Python define a word?

但是,这并不是因为我试图在我的代码中实现该答案(适用于该线程的OP而不适用于我)。

这是我的功能:

def define_word(user_define_input):
    srch = str(user_define_input[1])
    output_word=urllib.request.urlopen("http://dictionary.reference.com/browse/"+srch+"?s=t")
    output_word=output_word.read()
    items=re.findall('<meta name="description" content="'+".*$",output_word,re.MULTILINE)
    for output_word in items:
        y=output_word.replace('<meta name="description" content="','')
        z=y.replace(' See more."/>','')
        m=re.findall('at Dictionary.com, a free online dictionary with pronunciation, synonyms and translation. Look it up now! "/>',z)
        if m==[]:
            if z.startswith("Get your reference question answered by Ask.com"):
                print ("Word not found!")
            else:
                print (z)
    else:
        print ("Word not found!")

注意:

>>> print (user_define_input) #to show what is in the list
>>> define <word entered> #prints out the list, in this case, the program ignores user_define_input[0] and looks for [1] which is the targeted word

此外,这包含一些HTML:/抱歉,但这就是其他答案所用的内容。

所以,当我尝试使用它时出现错误:

File "/Users/******/GitHub/Multitool/functions.py", line 104, in define_word
items=re.findall('<meta name="description" content="'+".*$",output_word,re.MULTILINE)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/re.py", line 210, in findall
return _compile(pattern, flags).findall(string)
TypeError: can't use a string pattern on a bytes-like object

注意: functions.py的第104行是:

items=re.findall('<meta name="description" content="'+".*$",output_word,re.MULTILINE)

re.py的第210行是此函数的最后一行:

def findall(pattern, string, flags=0):
    """Return a list of all non-overlapping matches in the string.

    If one or more capturing groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern
has more than one group.

Empty matches are included in the result."""
    return _compile(pattern, flags).findall(string) #line 210

如果有任何不清楚的地方,请告诉我(我不确定要为此添加什么标签:/)。并提前感谢你:)随意更改任何内容甚至重写整个内容但只需确保使用变量/列表:

  • define_word(用于函数名称)
  • user_define_input

如果您希望查看git,请访问以下链接:https://github.com/DarkLeviathanz/Multitool.git

添加:

output_word = output_word.decode()

或更改

output_word = output_word.read().decode('iso-8859-2')
输入时输入

:定义测试:

Test definition, the means by which the presence, quality, or genuineness of anything is determined; a means of trial.<meta property="og:url" content="http://dictionary.reference.com/browse/test"/><link rel="shortcut icon" href="http://static.sfdict.com/dictcloud/favicon.ico"/><!--[if lt IE 9]><link rel="respond-proxy" id="respond-proxy" href="http://static.sfdict.com/app/respondProxy-d7e5f.html" /><![endif]--><!--[if lt IE 9]><link rel="respond-redirect" id="respond-redirect" href="http://dictionary.reference.com/img/respond.proxy.gif" /><![endif]--><link rel="search" type="application/opensearchdescription+xml" href="http://dictionary.reference.com/opensearch_desc.xml" title="Dictionary.com"/><link rel="publisher" href="https://plus.google.com/117428481782081853923"/><link rel="canonical" href="http://dictionary.reference.com/browse/test"/><link rel="stylesheet" href="http://dictionary.reference.com/drc/css/bootstrap.min-93899.css" type="text/css" media="all"/><link rel="stylesheet" href="http://dictionary.reference.com/drc/css/combinedSerp-8c61a.css" type="text/css" media="all"/><script type="text/javascript">var searchURL="http://dictionary.reference.com/browse/%40%40queryText%40%40?s=t";var CTSParams={"infix":"","clkpage":"dic","clksite":"dict","clkld":0};</script>
Word not found!

3 个答案:

答案 0 :(得分:1)

output_word = output_word.decode()

将字节转换为字符串。

<强>更新

这是聊天中脚本的最后一个状态(还远非完美......):

import requests
from lxml import html

def define_word(word):
    response = requests.get(
        "http://dictionary.reference.com/browse/{}?s=t".format(word))
    tree = html.fromstring(response.text)
    title = tree.xpath('//title/text()')
    print(title)
    defs = tree.xpath('//div[@class="def-content"]/text()')
    # print(defs)

    defs = ''.join(defs)
    defs = defs.split('\n')
    defs = [d for d in defs if d]
    for d in defs:
        print(d)

define_word('python')

答案 1 :(得分:1)

urllib.request.urlopen().read()返回一个字节字符串。该异常表示在将Python字符串应用于字节字符串时不能将其用作正则表达式模式。

字节字符串(通常)是编码的unicode字符串,在这种情况下,它看起来像UTF-8编码数据。因此,您需要将字节字符串解码为Python字符串,以便可以将其用作正则表达式模式:

output_word = urllib.request.urlopen("http://dictionary.reference.com/browse/"+srch+"?s=t")
output_word = output_word.read().decode('utf8')

那应该为你解决问题。

您需要知道要使用的编码。这可以通过查看Content-Type响应标头来完成,该标头针对此网址为Content-Type: text/html; charset=UTF-8。或者,由于这是HTML内容,您可以查找<meta http-equiv="Content-type" ...标记。

最后,您可以使用requests库来处理这个问题:

import requests
r = requests.get("http://dictionary.reference.com/browse/"+srch+"?s=t")
output_word = r.text

答案 2 :(得分:0)

经过一些更改,这是我坚持的代码,尽管它仍有一些缺陷。

def define_word(user_define_input):
    try:
        response = requests.get("http://dictionary.reference.com/browse/{}?s=t".format(user_define_input[1]))
    except IndexError:
        print("You have not entered a word!")
        return
    tree = html.fromstring(response.text)
    title = tree.xpath('//title/text()')
    print(title)
    print("\n")
    defs = tree.xpath('//div[@class="def-content"]/text()')
    defs = ''.join(defs)
    defs = defs.replace("() ", "")
    defs = defs.split('\n')
    defs = [d for d in defs if d]
    for d in defs:
        print(d)

这将用户输入拆分为包含两个项目的列表:

def split_line_test(user_input):
    global user_define_input
    user_define_input = user_input.split()
    if (user_define_input[0] == "define"): #define is user_define_input[0] while user_define_input[1] is the word that will be searched up
        return True
    if (user_define_input[0] == "weather"): #you can ignore this, it is for my other function
        return True
    return False

非常感谢帮助我修复代码的人:)