运行BeautifulSoup Python代码时没有输出

时间:2014-09-23 15:32:46

标签: python beautifulsoup

我最近使用来自this question的BeautifulSoup尝试了以下Python代码,这似乎对提问者起作用。

import urllib2
import bs4
import string
from bs4 import BeautifulSoup

badwords = set([
    'cup','cups',
    'clove','cloves',
    'tsp','teaspoon','teaspoons',
    'tbsp','tablespoon','tablespoons',
    'minced'
])

def cleanIngred(s):

    s=s.strip()

    s=s.strip(string.digits + string.punctuation)

    return ' '.join(word for word in s.split() if not word in badwords)

def cleanIngred(s):
    # remove leading and trailing whitespace
    s = s.strip()
    # remove numbers and punctuation in the string
    s = s.strip(string.digits + string.punctuation)
    # remove unwanted words
    return ' '.join(word for word in s.split() if not word in badwords)

def main():
    url = "http://allrecipes.com/Recipe/Slow-Cooker-Pork-Chops-II/Detail.aspx"
    data = urllib2.urlopen(url).read()
    bs = BeautifulSoup.BeautifulSoup(data)

    ingreds = bs.find('div', {'class': 'ingredients'})
    ingreds = [cleanIngred(s.getText()) for s in ingreds.findAll('li')]

    fname = 'PorkRecipe.txt'
    with open(fname, 'w') as outf:
        outf.write('\n'.join(ingreds))

if __name__=="__main__":
    main()

虽然出于某种原因,我无法让它在我的案例中工作。我收到错误:

AttributeError                            Traceback (most recent call last)
<ipython-input-4-55411b0c5016> in <module>()
     41 
     42 if __name__=="__main__":
---> 43     main()

<ipython-input-4-55411b0c5016> in main()
     31     url = "http://allrecipes.com/Recipe/Slow-Cooker-Pork-Chops-II/Detail.aspx"
     32     data = urllib2.urlopen(url).read()
---> 33     bs = BeautifulSoup.BeautifulSoup(data)
     34 
     35     ingreds = bs.find('div', {'class': 'ingredients'})

AttributeError: type object 'BeautifulSoup' has no attribute 'BeautifulSoup'

我怀疑这是因为我使用的是bs4而不是BeautifulSoup。我尝试用bs = BeautifulSoup.BeautifulSoup(data)替换行bs = bs4.BeautifulSoup(data),不再收到错误,但没有输出。是否有太多可能的原因可供猜测?

1 个答案:

答案 0 :(得分:1)

原始代码使用了BeautifulSoup第3版:

import BeautifulSoup

您切换到BeautifulSoup版本4,但也切换了导入的样式:

from bs4 import BeautifulSoup

删除该行;您之前已经在文件中输入了正确的内容:

import bs4

然后使用:

bs = bs4.BeautifulSoup(data)

或将后一行改为:

bs = BeautifulSoup(data)

(并删除import bs4行)。

您可能还想查看BeautifulSoup文档的Porting code to BS4 section,这样您就可以进行任何其他必要的更改来升级您找到的代码,以便从BeautifulSoup第4版中获得最佳效果。

该脚本可以正常工作并生成一个新文件PorkRecipe.txt,它不会在stdout上生成输出。

修复bs4.BeautifulSoup引用后文件的内容:

READY IN 4+ hrs

Slow Cooker Pork Chops II

Amazing Pork Tenderloin in the Slow Cooker

Jerre's Black Bean and Pork Slow Cooker Chili

Slow Cooker Pulled Pork

Slow Cooker Sauerkraut Pork Loin

Slow Cooker Texas Pulled Pork

Oven-Fried Pork Chops

Pork Chops for the Slow Cooker

Tangy Slow Cooker Pork Roast

Types of Cooking Oil

Garlic: Fresh Vs. Powdered

All about Paprika

Types of Salt
olive oil
chicken broth
garlic,
paprika
garlic powder
poultry seasoning
dried oregano
dried basil
thick cut boneless pork chops
salt and pepper to taste
PREP 10 mins
COOK 4 hrs
READY IN 4 hrs 10 mins
In a large bowl, whisk together the olive oil, chicken broth, garlic, paprika, garlic powder, poultry seasoning, oregano, and basil. Pour into the slow cooker. Cut small slits in each pork chop with the tip of a knife, and season lightly with salt and pepper. Place pork chops into the slow cooker, cover, and cook on High for 4 hours. Baste periodically with the sauce