Python Monitor网站的变化

时间:2017-08-05 12:35:22

标签: python html web screen-scraping

我想登录一个网站,获取数据,将其保存到文件中,一段时间后获取新数据并将其与旧(已保存)数据进行比较,如果发生变化则打印。我怎么做?登录正在运行,但比较不是。为什么呢?

提前谢谢!

我的代码:

public static void RegisterRoutes(RouteCollection routes){
routes.IgnoreRoute("{resource}.axd/{*pathInfo}");

routes.MapRoute(
name: "ProjectName1",
url: "Projectname1",
defaults: new { controller = "Project1", action = "Projectname1" }
);

routes.MapRoute(
name: "ProjectName2",
url: "Projectname2",
defaults: new { controller = "Project2", action = "Projectname2" }
);

routes.MapRoute(
name: "ProjectName3",
url: "Projectname3",
defaults: new { controller = "Project3", action = "Projectname3" }
);



routes.MapRoute(
name: "Default",
      url: "{controller}/{action}/{id}",
            defaults: new { controller = "Home", action = "view1", id = 
UrlParameter.Optional }
        );


}

2 个答案:

答案 0 :(得分:1)

我发现您的代码非常棘手。这是一个未经测试的替代方案,应该接近你想要实现的目标。

def fetch_html():
    # fetch logic
    return html  # string

def write_html(html):  # string
    # write logic

def read_html():
    with open('page.html','r') as f:
        return f.read()

def monitor():
    write_html(fetch_html())
    while True:
        time.sleep(5)
        new_html = fetch_html()
        if new_html == read_html():
            print('Nothing has changed')
        else:
            print('Something has changed')
            write_html(new_html)

monitor()

答案 1 :(得分:1)

问题是,当您致电sch.notification_rule时,-------------------------------------------------------- | id | schedule_id | notification_id | max_no_of_days | -------------------------------------------------------- | 1 | 1 | N01 | 14 | -------------------------------------------------------- 尚未更新。您应该string2返回login()并将其分配给每个循环login()