Question

大家下午好，

今天我被要求编写以下功能：

def compareurl(url1,url2,enc,n)

此函数比较两个网址并返回包含以下内容的列表：

[word,occ_in_url1,occ_in_u2]

其中：

字---＆gt; n lenght的话

occ_in_url1 ---＆gt; url1中的单词时间

occ_in_url2 ---＆gt; url2中的单词时间

所以我开始编写函数，这是我到目前为止写的：

def compare_url(url1,url2,enc,n):
    from urllib.request import urlopen
    with urlopen('url1') as f1:
        readpage1 = f1.read()
        decodepage1 = readpage1.decode('enc')
    with urlopen('url2') as f2:
        readpage2 = f2.read()
        decodepage2 = readpage2.decode('enc')
    all_lower1 = decodepage1.lower()
    all_lower2 = decodepage2.lower()
    import string
    all_lower1nopunctuation = "".join(l for l in all_lower1 if l not in string.punctuation)
    all_lower2nopunctuation = "".join(l for l in all_lower2 if l not in string.punctuation)
    for word1 in all_lower1nopunctuation:
        if len(word1) == k:
            all_lower1nopunctuation.count(word1)
    for word2 in all_lower2nopunctuation:
        if len(word2) == k:
            all_lower2opunctuation.count(word2)
    return(word1,all_lower1nopunctuation.count(word1),all_lower2opunctuation.count(word2))
    return(word2,all_lower1nopunctuation.count(word1),all_lower2opunctuation.count(word2))

但是这段代码并没有按照我的想法运作，实际上根本不起作用。

我还想：

逐渐对返回列表进行排序（从返回次数最多的单词）
如果2个单词出现的次数相同，则必须将其返回字母顺序

Answer 1

您的代码中存在一些拼写错误（请注意将来的内容），但也存在一些python问题（或可以改进的内容）。

首先，您的imports should come in the top文件

from urllib.request import urlopen
import string

您应该使用urlopen致电string，这就是您正在做的事情，但此字符串为'url1'而非'http://...'。您不在引号内使用变量：

with urlopen(url1) as f1: #remove quotes
    readpage1 = f1.read()
    decodepage1 = readpage1.decode(enc) #remove quotes
with urlopen(url2) as f2: #remove quotes
    readpage2 = f2.read()
    decodepage2 = readpage2.decode(enc) #remove quotes

您需要改进 all_lower1nopunctuation 初始化。您正在将stackoverflow.com替换为stackoverflowcom，stackoverflow com实际应为#all_lower1nopunctuation = "".join(l for l in all_lower1 if l not in string.punctuation) #the if statement should be after 'l' and before 'for' #you should include 'else' to replace the punctuation with a space all_lower1nopunctuation = ''.join(l if l not in string.punctuation else ' ' for l in all_lower1) all_lower2nopunctuation = ''.join(l if l not in string.punctuation else ' ' for l in all_lower2)。

for

将all_lower1nopunctuation.count(word1)合并为一个。还要在集合中添加找到的单词（唯一元素列表）。

for word1 in all_lower1nopunctuation返回 word_ 出现在 all_lower1nopunctuation 中的次数。它不增加一个计数器。

.split(' ')不起作用，因为 all_lower1nopunctuation 是字符串（而不是列表）。使用.replace('\n', '')将其转换为列表。

#for word1 in all_lower1nopunctuation: # if len(word1) == k: #also, this should be == n, not == k # all_lower1nopunctuation.count(word1) #for word2 in all_lower2nopunctuation: # if len(word2) == k: # all_lower2opunctuation.count(word2) word_set = set([]) for word in all_lower1nopunctuation.replace('\n', '').split(' '): if len(word) == n and word in all_lower2nopunctuation: word_set.add(word) #set uses .add() instead of .append()删除所有换行符，否则它们也会被计为单词。

count_list = []
for final_word in word_set:
    count_list.append((final_word,
    all_lower1nopunctuation.count(final_word),
    all_lower2nopunctuation.count(final_word)))

现在您在两个网址上都有一组字词，您需要存储每个网址中有多少字。以下代码将确保您拥有元组列表为you asked

return

返回意味着函数已经完成，并且解释器会在函数被调用之前的任何地方继续，所以返回后的任何内容都无关紧要。

正如RemcoGerlich所述。

您的代码始终仅返回第一个 #return(word1,all_lower1nopunctuation.count(word1),all_lower2opunctuation.count(word2)) #return(word2,all_lower1nopunctuation.count(word1),all_lower2opunctuation.count(word2)) return(count_list) # which contains a list of tuples with all words and its counts，因此您需要将两个返回合并为一个。

from urllib.request import urlopen
import string

def compare_url(url1,url2,enc,n):
    with urlopen(url1) as f1:
        readpage1 = f1.read()
        decodepage1 = readpage1.decode(enc)
    with urlopen(url2) as f2:
        readpage2 = f2.read()
        decodepage2 = readpage2.decode(enc)

    all_lower1 = decodepage1.lower()
    all_lower2 = decodepage2.lower()

    all_lower1nopunctuation = ''.join(l if l not in string.punctuation
    else ' ' for l in all_lower1)
    all_lower2nopunctuation = ''.join(l if l not in string.punctuation
    else ' ' for l in all_lower2)

    word_set = set([])
    for word in all_lower1nopunctuation.replace('\n', '').split(' '):
        if len(word) == n and word in all_lower2nopunctuation:
            word_set.add(word)

    count_list = []
    for final_word in word_set:
        count_list.append((final_word,
        all_lower1nopunctuation.count(final_word),
        all_lower2nopunctuation.count(final_word)))

    return(count_list)

url1 = 'https://www.tutorialspoint.com/python/list_count.htm'
url2 = 'https://stackoverflow.com/a/128577/7067541'

for word_count in compare_url(url1,url2, 'utf-8', 5):
    print (word_count)

<强> TL; DR

$params = [
    'index' => 'test_index',
    'type' => 'test_index_type',
    'body' => [
        'query' => [
            'bool' => [
                'should' => [
                    [ 'match' => [ 'field1' => '12' ] ],
                    [ 'multi_match' => [ 'query' => '345',
                                         'fields' => ['field2', 'field3']] ],
                ]
            ]
        ]
    ]
];

返回在不同列表上计算的单词列表

1 个答案: