Question

我在打印只有特定值的已删除html时遇到问题

这是我的程序刮取的特定HTML行

<input name="form_key" type="hidden" value="MmghsMIlPm5bd2Dw"/>

我的代码如下

import requests, time
from bs4 import BeautifulSoup
from colorama import Fore, Back, Style, init


print(Fore.CYAN + "Lets begin!"")
init(autoreset=True)

url = raw_input("Enter URL: ")

print(Fore.CYAN + "\nGetting form key")


r = requests.get(url)

soup = BeautifulSoup(r.content, "html.parser")

data = soup.find_all("input", {'name': 'form_key', 'type':'hidden'})

for data in data:
    print(Fore.YELLOW + "Found Form Key:")
    print(data)

该程序可以很好地删除它，但打印出我希望仅打印的整行，并且＃M; MmghsMIlPm5bd2Dw＆＃34; （没有引号）

我怎样才能实现这个目标？

我尝试过像

这样的事情

print soup.find(data).text

和

last_input_tag = soup.find("input", id="value")
print(last_input_tag)

但似乎没有任何东西真的有用

Answer 1

如果您打印数据并且它显示整个输入语句，您应该能够通过指定它来打印该值

print(data.get('value'))

请参考此处的文档 https://www.crummy.com/software/BeautifulSoup/bs4/doc/

Answer 2

更一般地说......假设html中有多个标签：

from bs4 import BeautifulSoup

html = '''<title><p><input name="form_key" type="hidden" value="MmghsMIlPm5bd2Dw"/>
<input name="form_key" type="hidden" value="abcdefghijklmo"/>
<input name="form_key" type="hidden"/>
</p></title>'''

soup = BeautifulSoup(html, "html.parser")

我们可以搜索名称为input的所有代码。

tags = soup.find_all('input')

然后我们可以遍历所有标记以检索具有value属性的标记。因为标签可以像下面的字典一样对待，所以我们可以使用*.get()方法查询属性，就像它们是键一样。此方法查找名为value的属性：

如果找到此属性，则该方法返回与之关联的值属性
如果找不到属性，则使用*.get()方法将返回您提供的默认值作为第二个参数：

循环浏览标签......

for tag in tags:
    print(tag.get('value', 'value attribute not found'))

=== Output: ===
MmghsMIlPm5bd2Dw
abcdefghijklmo
value attribute not found

使用Python

2 个答案: