Question

我试图制作一个猎犬。

我使用 wget 来获取网站，并且喊出所有的文字。

我想写一个像

这样的字典

{'Activity':'index2.html','and':'index2.html','within':'index2.html',...}
{'Rutgers':'index.html','Central':'index.html','Service':'index,html',...}

但我的输出是

{'Activity':'i','and':'n','within':'d',...} 
{'Rutgers':'i','Central':'n','Service':'d',...}

它分割了我的文件名。

import string
import os
from bs4 import BeautifulSoup as bs
from os import listdir
from os.path import isfile, join
#from os.path import isdir

mypath = "/Users/Tsu-AngChou/MasterProject/Practice/try_test/"
files = listdir(mypath)
translator = str.maketrans("","",string.punctuation)
storage = []
for f in files:
  fullpath = join(mypath, f)
  if f == '.DS_Store':
                os.remove(f)
  elif isfile(fullpath):

    print(f)
    for html_cont in range(1):
        response = open(f,'r',encoding='utf-8')
        html_cont = response.read()
        soup = bs(html_cont, 'html.parser',from_encoding ='utf-8')
        regular_string = soup.get_text()

        new_string = regular_string.translate(translator).split()
        new_list = [item[:14] for item in new_string]
        a = dict(zip(new_list,f))
        print(a)

Answer 1

你需要一个简单的对f作为一个元素; zip逐步执行每个序列的元素。尝试这样的事情：

sent = "Activity and within".split()
f = "index.html"
a = dict((word, f) for word in sent)
print(a)

输出：

{'Activity': 'index.html', 'and': 'index.html', 'within': 'index.html'}

Answer 2

您可以使用dict.fromkeys：

a = dict.fromkeys(newlist, f)

这会使用newlist作为键，并为每个键提供相同的值f。

是否可以将文件名分配给python中的不同键？

2 个答案: