关于代码本身

Question

我有这个问题，我想自动化脚本。在通过的项目中，我已经使用了python scheduler。但对于这个项目，我不确定如何处理这个问题。

问题是代码使用代码外部的登录详细信息，并在启动脚本时输入命令行。

离。 python scriptname.py email@youremail.com密码

如何使用python scheduler自动执行此操作？＆＃39; scriptname.py＆＃39;中的代码是：

//LinkedBot.py
import argparse, os, time
import urlparse, random
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup

def getPeopleLinks(page):
    links = []
    for link in page.find_all('a'):
        url = link.get('href')
        if url:
            if 'profile/view?id=' in url:
                links.append(url)
    return links

def getJobLinks(page):
    links = []
    for link in page.find_all('a'):
        url = link.get('href')
        if url:       
            if '/jobs' in url:
                links.append(url)
    return links

def getID(url):
    pUrl = urlparse.urlparse(url)
    return urlparse.parse_qs(pUrl.query)['id'][0]


def ViewBot(browser):
    visited = {}
    pList = []
    count = 0
    while True:
        #sleep to make sure everything loads, add random to make us look human.
        time.sleep(random.uniform(3.5,6.9))
        page = BeautifulSoup(browser.page_source)
        people = getPeopleLinks(page)
        if people:
            for person in people:
                ID = getID(person)
                if ID not in visited:
                    pList.append(person)
                    visited[ID] = 1
        if pList: #if there is people to look at look at them
            person = pList.pop()
            browser.get(person)
            count += 1
        else: #otherwise find people via the job pages
            jobs = getJobLinks(page)
            if jobs:
                job = random.choice(jobs)
                root = 'http://www.linkedin.com'
                roots = 'https://www.linkedin.com'
                if root not in job or roots not in job:
                    job = 'https://www.linkedin.com'+job
                browser.get(job)
            else:
                print "I'm Lost Exiting"
                break

        #Output (Make option for this)           
        print "[+] "+browser.title+" Visited! \n("\
            +str(count)+"/"+str(len(pList))+") Visited/Queue)"


def Main():
    parser = argparse.ArgumentParser()
    parser.add_argument("email", help="linkedin email")
    parser.add_argument("password", help="linkedin password")
    args = parser.parse_args()

    browser = webdriver.Firefox()

    browser.get("https://linkedin.com/uas/login")


    emailElement = browser.find_element_by_id("session_key-login")
    emailElement.send_keys(args.email)
    passElement = browser.find_element_by_id("session_password-login")
    passElement.send_keys(args.password)
    passElement.submit()

在OSX上运行它。

Answer 1

我可以看到至少两种不同的自动触发脚本的方法。既然你提到你的脚本以这种方式启动：

python scriptname.py email@youremail.com password

这意味着你从shell启动它。由于你想安排它，听起来像Crontab是一个完美的答案。（例如，参见https://kvz.io/blog/2007/07/29/schedule-tasks-on-linux-using-crontab/）

如果你真的想使用python scheduler，你可以使用子进程。

在使用python scheduler的文件中：

import subprocess

subprocess.call("python scriptname.py email@youremail.com password", shell=True)

What is the best way to call a Python script from another Python script?

Answer 2

关于代码本身

LinkedIn REST Api

您是否尝试过使用LinkedIn的REST Api而不是检索重页，填写某些表格并将其发回？

每当LinkedIn更改其页面中的某些元素时，您的代码就容易被破坏。而Api是LinkedIn和用户之间的合同。

点击https://developer.linkedin.com/docs/rest-api和https://developer.linkedin.com/docs/guide/v2/concepts/methods

凭据

因此，您不必通过命令行传递凭据（尤其是您的密码，通过history可以清楚地阅读），您应该

使用配置文件（使用您的Api密钥）并使用ConfigParser（或其他任何内容，具体取决于配置文件的格式（json，python等）...
或将它们设置为您的环境变量。

用于调度

使用Cron

此外，对于计划部分，您可以使用cron。

使用Celery

如果您正在寻找100％的Python解决方案，您可以使用优秀的Celery项目。检查其periodic tasks。

Answer 3

您可以将args传递给python调度程序。

scheduler.enter（延迟，优先级，操作，参数=（），kwargs = {}）安排延迟更多时间单位的活动。除了相对时间，其他参数，效果和返回值与enterabs（）的相同。在版本3.3中更改：参数参数是可选的。版本3.3中的新功能：添加了kwargs参数。

>>> import sched, time
>>> s = sched.scheduler(time.time, time.sleep)
>>> def print_time(a='default'):
...     print("From print_time", time.time(), a)
...
>>> def print_some_times():
...     print(time.time())
...     s.enter(10, 1, print_time)
...     s.enter(5, 2, print_time, argument=('positional',))
...     s.enter(5, 1, print_time, kwargs={'a': 'keyword'})
...     s.run()
...     print(time.time())
...
>>> print_some_times()
930343690.257
From print_time 930343695.274 positional
From print_time 930343695.275 keyword
From print_time 930343700.273 default
930343700.276

使用命令行进行Python计划

3 个答案:

关于代码本身

LinkedIn REST Api

凭据

用于调度

使用Cron

使用Celery