re.compile regex assistance(python,beautifulsoup)

时间:2016-12-12 06:46:28

标签: python regex beautifulsoup


使用来自不同线程的代码

import re
import requests
from bs4 import BeautifulSoup

data = """
<script type="text/javascript">
    window._propertyData = 
    { *** a lot of random code and other data ***
    "property": {"street": "21st Street", "apartment": "2101", "available": false}
    *** more data ***
    }
</script>
"""

soup = BeautifulSoup(data, "xml")
pattern = re.compile(r'\"street\":\s*\"(.*?)\"', re.MULTILINE | re.DOTALL)
script = soup.find("script", text=pattern)
print pattern.search(script.text).group(1)

这让我得到了理想的结果:

  第21街

但是,我试图通过尝试正则表达式的不同变体来获得整个事情并且无法实现输出:

  

{“street”:“21st Street”,“apartment”:“2101”,“available”:false}

我尝试了以下内容:

pattern = re.compile(r'\"property\":\s*\"(.*?)\{\"', re.MULTILINE | re.DOTALL)

它没有产生预期的结果。
感谢您的帮助! 感谢。

4 个答案:

答案 0 :(得分:1)

根据上面的评论,更正您的拼写错误并使用此

property :

RegexDemo

\W+寻找确切的字符串

({.*?}):匹配任何非单词字符

.*:捕获第一组

  • {}匹配大括号?
  • 内的任何字符
  • toString 1 = "1" toString True = "True" toString "1" = "1" 尽可能多地匹配

答案 1 :(得分:0)

你可以试试这个:

\"property\":\s*(\{.*?\})

捕获组1包含所需数据

Explanation

示例代码:

import re

regex = r"\"property\":\s*(\{.*?\})"

test_str = ("window._propertyData = \n"
    "    { *** a lot of random code and other data ***\n"
    "    \"property\": {\"street\": \"21st Street\", \"apartment\": \"2101\", \"available\": false}\n"
    "    *** more data ***\n"
    "    }")

matches = re.finditer(regex, test_str, re.MULTILINE | re.DOTALL)

for matchNum, match in enumerate(matches):
   print(match.group(1))

Run it here

答案 2 :(得分:0)

试试这个,可能很长但是工作很好

\"property\"\:\s*(\{((?:\"\w+\"\:\s*\"?[\w\s]+\"?\,?\s?)+?)\})

https://regex101.com/r/7KzzRV/3

答案 3 :(得分:0)

str_form: {"street": "21st Street", "apartment": "2101", "available": false}
dict_form:  {'available': False, 'street': '21st Street', 'apartment': '2101'}

出:

===============
SyntaxError at /
invalid syntax (views.py, line 34)
Request Method: GET
Request URL:    http://localhost:8000/
Django Version: 1.8.5
Exception Type: SyntaxError
Exception Value:    
invalid syntax (views.py, line 34)
Exception Location: /home/arajguru/training/mycode/myshop/orders/urls.py in <module>, line 2
Python Executable:  /home/arajguru/training/mycode/env/myshop/bin/python