从python在终端执行命令时遇到问题

时间:2014-12-22 10:44:34

标签: python

我有以下命令。

/usr/bin/node_modules/phantomjs/bin/phantomjs RequestURL.js https://www.ubank.com.au/ > body.html在终端上被执行得很好,但是没有从python执行如下。

def get_generated_html(self, url, has_headers):
        """
        Method: Method to get the generated HTML content from Phantomas.

        Args: Takes the url as an argument for which to get the HTML content.
              hasHeaders defaulted to false for no headers.

        Returns: Nothing.
        """
        if not urlparse(url).scheme:
            url = 'http://'+url
        if has_headers == False:
            command = PAGE_SOURCE_CMD % url
            utils.execute_command(command).communicate()
        else:
            command = FEO_PAGE_SOURCE_CMD % url
            print command
            utils.execute_command(command).communicate()

print语句打印出确切的命令。

这是execute_command()方法。

def execute_command(command):
    """Executes the command and returns the process."""
    process = None
    try:
        process = subprocess.Popen(command, shell=True,
                                   stdout=subprocess.PIPE,
                                   stderr=subprocess.PIPE)
    except subprocess.CalledProcessError:
        print (
            'Process utility could not process %s'
            ' inside execute_command() method'
            % url)
    return process

我按如下方式调用生成的html。

def start_parser(self, analysis_id, url, hasHeaders=False):
        """
        Method: Method to start the parser.

        Args: Analsyis ID and URL as an argument.

        Returns: Nothing.
        """

        feed = None
        path = self.create_analysis_folder(analysis_id, hasHeaders) 
        self.get_generated_html(url, hasHeaders)
        for root, dirs, files in os.walk(path):
            for file in files:
                if file.endswith('.html'):
                    feed = BeautifulSoup(open(path + '/' +file).read())
                    if hasHeaders:
                        os.chdir('..')
                    print "deleting"
                    shutil.rmtree(os.getcwd())
            break
        return feed

此处返回的Feed不是页面源,因为它从命令行返回。

1 个答案:

答案 0 :(得分:0)

我不确定get_generated_html应该做什么,但它不会返回任何内容,而在其中一个案例中它只会print。我对文档字符串感到困惑,因为它说函数不应该返回任何内容,但它也没有说任何关于输出的内容。 "默认为false"部分在当前代码中不正确。

此外,communicate返回一个元组,因此您只返回返回值的一个元素(可能)。

尝试类似:

output = utils.execute_command(command).communicate()
return output[0]

如果我们正在进行代码审查,这对我来说更优雅:

if not has_headers:
    command = PAGE_SOURCE_CMD % url
else:
    command = FEO_PAGE_SOURCE_CMD % url
output = utils.execute_command(command).communicate()
return output[0]  # or print it?