使用BeautifulSoup捕获JavaScript警报文本

时间:2019-03-01 16:13:03

标签: javascript python beautifulsoup

我正在使用此JavaScript验证表单:

<script type="text/javascript">
        function validateForm()
        {
            var a=document.forms["orderform"]["Name"].value;
            var b=document.forms["orderform"]["Street"].value;
            var c=document.forms["orderform"]["ZIP"].value;
            var d=document.forms["orderform"]["City"].value;
            var e=document.forms["orderform"]["PhoneNumber"].value;
            if (
                a==null || a=="" || 
                b==null || b=="" || 
                c==null || c=="" || 
                d==null || d=="" || 
                e==null || e==""
                )
            {alert("Please fill all the required fields.");
            return false;
            }
        }
      </script>

我正在尝试使用BeatifulSoup捕获警报文本:

import re
from bs4 import BeautifulSoup

with open("index.html") as fp:
  soup = BeautifulSoup(fp, "lxml")

for script in soup.find_all(re.compile("(?<=alert\(\").+(?=\")")):
  print(script)

这不返回任何内容。这基于BS文档中“正则表达式”下给出的示例,以查找以“ b”开头的标签名称:

import re
for tag in soup.find_all(re.compile("^b")):
    print(tag.name)
# body
# b

但是我似乎找不到与将打印警报文本的'print(tag.name)'等效的内容。还是我完全走错了路?非常感谢您的帮助。

编辑: 我尝试过:

pattern = re.compile("(?<=alert\(\").+(?=\")"))
for script in soup.find_all ('script'):
  print(script.pattern)

这将返回“无”。

2 个答案:

答案 0 :(得分:2)

运行所有html数据将不起作用。首先,您需要提取script数据,然后可以轻松地解析alert文本。

import re
from bs4 import BeautifulSoup

with open("index.html") as fp:
  soup = BeautifulSoup(fp, "lxml")

script = soup.find("script").extract()

# find all alert text
alert = re.findall(r'(?<=alert\(\").+(?=\")', script.text)
print(alert)

输出:

['Please fill all the required fields.']

答案 1 :(得分:0)

如果我对您的理解正确,那么可能就是您要找的东西:

html = """
 <script type="text/javascript">
    function validateForm()
    {
        var a=document.forms["orderform"]["Name"].value;
        var b=document.forms["orderform"]["Street"].value;
        var c=document.forms["orderform"]["ZIP"].value;
        var d=document.forms["orderform"]["City"].value;
        var e=document.forms["orderform"]["PhoneNumber"].value;
        if (a==null || a=="", b==null || b=="", a==null || c=="", c==null || d=="", d==null || e=="", a==null || e=="")
        {alert("Please fill all the required fields.");
        return false;
        }
    }
  </script>
   """

soup = BeautifulSoup(html, "lxml")
alert = soup.text.split('"')
alert[33] 

输出:

'Please fill all the required fields.'