调用和使用存储在变量中的属性(使用Beautifulsoup 4)

时间:2017-01-22 15:29:24

标签: python python-3.x beautifulsoup

我想从变量中调用一个Beautiful Soup属性(例如class_,href,id),以便在以下函数中使用它:

脚本

from bs4 import BeautifulSoup
data='<p class="story">xxx </p> <p id="2">yyy</p> <p class="story"> zzz</p>'

def removeAttrib(data, **kwarg):
    soup = BeautifulSoup(data, "html.parser")
    for x in soup.findAll(tag, kwargs):
        del x[???] # should be an equivalent of: del x["class"]

kwargs= {"class":"story"}
removeAttrib(data,"p",**kwargs )
print(soup)

预期结果:

<p>xxx </p> <p id="2">yyy</p> <p> zzz</p>

MYGz使用字典作为函数的参数,使用tag, argdict解决了第一个问题。然后我在this question中找到**kwargs(传递字典键和值)。

但我找不到del x["class"]的方法。 如何传递“class”键?我尝试使用ckey=kwargs.keys()然后使用del x[ckey],但它无效。

ps1:任何想法为什么removeAttrib(数据,“p”,{“class”:“story”})不起作用? Ps2:这是另一个主题而不是this(它不是重复的)

2 个答案:

答案 0 :(得分:1)

您可以改为传递词典:

from bs4 import BeautifulSoup
data='<p class="story">xxx </p> <p id="2">yyy</p> <p class="story"> zzz</p>'
soup = BeautifulSoup(data, "html.parser")

def removeAttrib(soup, tag, argdict):

    for x in soup.findAll(tag, argdict):
        x.decompose()

removeAttrib(soup, "p", {"class": "story"})

答案 1 :(得分:1)

归功于MYGz和commandlineluser

from bs4 import BeautifulSoup
data='<p class="story">xxx </p> <p id="2">yyy</p> <p class="story"> zzz</p>'


def removeAttrib(data, tag, kwargs):
    soup = BeautifulSoup(data, "html.parser")
    for x in soup.findAll(tag, kwargs):
        for key in kwargs:
            # print(key) #>>class           
            x.attrs.pop(key, None) # attrs: to access the actual dict 
            #del x[key] would work also but will throw a KeyError if no key

    print(soup)           
    return soup

data=removeAttrib(data,"p",{"class":"story"})