如何使用django中的beautifulsoup从json中提取数据

时间:2016-02-26 09:40:23

标签: json beautifulsoup django-views

美好的一天。我试图从json中提取值时遇到问题。 首先,我的beautifulsoup在贝壳中非常精细,但在django中没有。我试图实现的是从收到的json中提取数据,但没有成功。在我看来,这是我们的课程:

br1

在json中,有一个阵列" Observations"从中我试图获得城市名称,温度高低。

但是当我尝试这样做时:

class FetchWeather(generic.TemplateView):
    template_name = 'forecastApp/pages/weather.html'

    def get_context_data(self, **kwargs):
        context = super().get_context_data(**kwargs)
        url = 'http://weather.news24.com/sa/cape-town'
        city = 'cape town'
        url_request = requests.get(url)
        soup = BeautifulSoup(url_request.content, 'html.parser')
        city_list = soup.find(id="ctl00_WeatherContentHolder_ddlCity")
        print(soup.head)
        city_as_on_website = city_list.find(text=re.compile(city, re.I)).parent
        cityId = city_as_on_website['value']
        json_url = "http://weather.news24.com/ajaxpro/TwentyFour.Weather.Web.Ajax,App_Code.ashx"

        headers = {
            'Content-Type': 'text/plain; charset=UTF-8',
            'Host': 'weather.news24.com',
            'Origin': 'http://weather.news24.com',
            'Referer': url,
            'User-Agent': 'Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/48.0.2564.82 Chrome/48.0.2564.82 Safari/537.36',
            'X-AjaxPro-Method': 'GetCurrentOne'}

        payload = {
            "cityId": cityId
        }
        request_post = requests.post(json_url, headers=headers, data=json.dumps(payload))
        print(request_post.content)
        context['Observations'] = request_post.content
        return context

我收到错误。这是对它的追溯:

cityDict = json.loads(str(html))

任何帮助都将很高兴。

1 个答案:

答案 0 :(得分:1)

request_post.content内的JSON数据存在两个问题:

  • 那里有JS日期对象值,例如:

    "Date":new Date(Date.UTC(2016,1,26,22,0,0,0))
    
  • 最后有不需要的字符:;/*"

让我们清理JSON数据,以便加载json

from datetime import datetime

data = request_post.text

def convert_date(match):
    return '"' + datetime(*map(int, match.groups())).strftime("%Y-%m-%dT%H:%M:%S") + '"'

data = re.sub(r"new Date\(Date\.UTC\((\d+),(\d+),(\d+),(\d+),(\d+),(\d+),(\d+)\)\)",
              convert_date,
              data)

data = data.strip(";/*")
data = json.loads(data)

context['Observations'] = data