我最近使用来自this question的BeautifulSoup尝试了以下Python代码,这似乎对提问者起作用。
import urllib2
import bs4
import string
from bs4 import BeautifulSoup
badwords = set([
'cup','cups',
'clove','cloves',
'tsp','teaspoon','teaspoons',
'tbsp','tablespoon','tablespoons',
'minced'
])
def cleanIngred(s):
s=s.strip()
s=s.strip(string.digits + string.punctuation)
return ' '.join(word for word in s.split() if not word in badwords)
def cleanIngred(s):
# remove leading and trailing whitespace
s = s.strip()
# remove numbers and punctuation in the string
s = s.strip(string.digits + string.punctuation)
# remove unwanted words
return ' '.join(word for word in s.split() if not word in badwords)
def main():
url = "http://allrecipes.com/Recipe/Slow-Cooker-Pork-Chops-II/Detail.aspx"
data = urllib2.urlopen(url).read()
bs = BeautifulSoup.BeautifulSoup(data)
ingreds = bs.find('div', {'class': 'ingredients'})
ingreds = [cleanIngred(s.getText()) for s in ingreds.findAll('li')]
fname = 'PorkRecipe.txt'
with open(fname, 'w') as outf:
outf.write('\n'.join(ingreds))
if __name__=="__main__":
main()
虽然出于某种原因,我无法让它在我的案例中工作。我收到错误:
AttributeError Traceback (most recent call last)
<ipython-input-4-55411b0c5016> in <module>()
41
42 if __name__=="__main__":
---> 43 main()
<ipython-input-4-55411b0c5016> in main()
31 url = "http://allrecipes.com/Recipe/Slow-Cooker-Pork-Chops-II/Detail.aspx"
32 data = urllib2.urlopen(url).read()
---> 33 bs = BeautifulSoup.BeautifulSoup(data)
34
35 ingreds = bs.find('div', {'class': 'ingredients'})
AttributeError: type object 'BeautifulSoup' has no attribute 'BeautifulSoup'
我怀疑这是因为我使用的是bs4而不是BeautifulSoup。我尝试用bs = BeautifulSoup.BeautifulSoup(data)
替换行bs = bs4.BeautifulSoup(data)
,不再收到错误,但没有输出。是否有太多可能的原因可供猜测?
答案 0 :(得分:1)
原始代码使用了BeautifulSoup第3版:
import BeautifulSoup
您切换到BeautifulSoup版本4,但也切换了导入的样式:
from bs4 import BeautifulSoup
删除该行;您之前已经在文件中输入了正确的内容:
import bs4
然后使用:
bs = bs4.BeautifulSoup(data)
或将后一行改为:
bs = BeautifulSoup(data)
(并删除import bs4
行)。
您可能还想查看BeautifulSoup文档的Porting code to BS4 section,这样您就可以进行任何其他必要的更改来升级您找到的代码,以便从BeautifulSoup第4版中获得最佳效果。
该脚本可以正常工作并生成一个新文件PorkRecipe.txt
,它不会在stdout上生成输出。
修复bs4.BeautifulSoup
引用后文件的内容:
READY IN 4+ hrs
Slow Cooker Pork Chops II
Amazing Pork Tenderloin in the Slow Cooker
Jerre's Black Bean and Pork Slow Cooker Chili
Slow Cooker Pulled Pork
Slow Cooker Sauerkraut Pork Loin
Slow Cooker Texas Pulled Pork
Oven-Fried Pork Chops
Pork Chops for the Slow Cooker
Tangy Slow Cooker Pork Roast
Types of Cooking Oil
Garlic: Fresh Vs. Powdered
All about Paprika
Types of Salt
olive oil
chicken broth
garlic,
paprika
garlic powder
poultry seasoning
dried oregano
dried basil
thick cut boneless pork chops
salt and pepper to taste
PREP 10 mins
COOK 4 hrs
READY IN 4 hrs 10 mins
In a large bowl, whisk together the olive oil, chicken broth, garlic, paprika, garlic powder, poultry seasoning, oregano, and basil. Pour into the slow cooker. Cut small slits in each pork chop with the tip of a knife, and season lightly with salt and pepper. Place pork chops into the slow cooker, cover, and cook on High for 4 hours. Baste periodically with the sauce