Part of value inside of 'style' attrs to become variable in python bs4

时间:2017-08-30 20:58:13

标签: python web-scraping bs4 attrs.xml

Let's assume we have code:

cached_property

How may I make a list using Bs4 in Python to find the highest value of 'left' in 'style' attrs keeping in mind that I do not want to take into consideration spans with class_ "Kung"

Desired result would be:

from django.utils.functional import cached_property

class RouteList(ListView):
    model = DailyRoute
    template_name = 'route_list.html'

    @cached_property
    def stage_exist(self):
        return DailyRoute.objects.filter( stage = '1').exists()

    def get_queryset(self):
        if self.stage_exist:
            query_set = DailyRoute.objects.filter(owner=employer, stage = '1').order_by('route')
        else:
            query_set = DailyRoute.objects.none()
        return query_set

    def get_template_names(self):
        return ['template 1.html'] self.stage_exist else ['template 2.html']

I've got it I should start with something like:

    if(!isConnectedToInternet()){
          Snackbar snackbar = Snackbar
             .make(view, "No internet connection!", Snackbar.LENGTH_LONG)
             .setAction("RETRY", new View.OnClickListener() {
                 @Override
                 public void onClick(View view) {
                      isConnectedToInternet();
                      Intent intent = new Intent(getBaseContext(),NextActivity.class);
                      startActivity(intent);                           
                      Toast.makeText(getApplicationContext(),"OK",Toast.LENGTH_LONG).show();

               }
             });
       snackbar.show(); 
        }
    }

    private boolean isConnectedToInternet() {
       ConnectivityManager cm =
                        (ConnectivityManager)getApplicationContext().getSystemService(Context.CONNECTIVITY_SERVICE);
                NetworkInfo netInfo = cm.getActiveNetworkInfo();
                return netInfo != null && netInfo.isConnectedOrConnecting();
            }

1 个答案:

答案 0 :(得分:0)

>>> import bs4
>>> HTML = open('temp.htm').read()
>>> soup = bs4.BeautifulSoup(HTML, 'lxml')

首先,选择class包含foo的所有元素(无论是否包含其他内容)。

>>> elements = soup.select('.foo')

在每种情况下,element['class']都会列出class元素中的项目列表,即只有foofooKung这个HTML的情况。因此,对element['class']长度的测试只是对foo单独存在的测试。

element['style']获取元素style的内容。对我们想要的部分使用正则表达式,并将其添加到名为lefts的列表中。

>>> lefts = [ ]
>>> for element in elements:
...     if len(element['class'])==1:
...         lefts.append(int(bs4.re.search(r'left:([0-9]+)', element['style']).groups(0)[0]))
... 
>>> 
>>> lefts
[70, 50, 99]

修改

找到tr元素,然后查找elementsfoo。与以前一样,仅考虑仅包含foo类而非fooKung的元素。收集这些元素的left样式元素,然后找到它们的最大值。

>>> HTML = open('temp.htm').read()
>>> import bs4
>>> soup = bs4.BeautifulSoup(HTML, 'lxml')
>>> trs = soup.findAll('tr')
>>> tr_max = []
>>> for tr in trs:
...     elements = tr.select('.foo')
...     lefts = [ ]
...     for element in elements:
...         if len(element['class'])==1:
...             lefts.append(int(bs4.re.search(r'left:([0-9]+)', element['style']).groups(0)[0]))
...     if lefts:
...         tr_max.append(max(lefts))
...     else:
...         tr_max.append(None)
... 
>>> tr_max
[70, None, 99]