我有以下命令。
/usr/bin/node_modules/phantomjs/bin/phantomjs RequestURL.js https://www.ubank.com.au/ > body.html
在终端上被执行得很好,但是没有从python执行如下。
def get_generated_html(self, url, has_headers):
"""
Method: Method to get the generated HTML content from Phantomas.
Args: Takes the url as an argument for which to get the HTML content.
hasHeaders defaulted to false for no headers.
Returns: Nothing.
"""
if not urlparse(url).scheme:
url = 'http://'+url
if has_headers == False:
command = PAGE_SOURCE_CMD % url
utils.execute_command(command).communicate()
else:
command = FEO_PAGE_SOURCE_CMD % url
print command
utils.execute_command(command).communicate()
print语句打印出确切的命令。
这是execute_command()方法。
def execute_command(command):
"""Executes the command and returns the process."""
process = None
try:
process = subprocess.Popen(command, shell=True,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
except subprocess.CalledProcessError:
print (
'Process utility could not process %s'
' inside execute_command() method'
% url)
return process
我按如下方式调用生成的html。
def start_parser(self, analysis_id, url, hasHeaders=False):
"""
Method: Method to start the parser.
Args: Analsyis ID and URL as an argument.
Returns: Nothing.
"""
feed = None
path = self.create_analysis_folder(analysis_id, hasHeaders)
self.get_generated_html(url, hasHeaders)
for root, dirs, files in os.walk(path):
for file in files:
if file.endswith('.html'):
feed = BeautifulSoup(open(path + '/' +file).read())
if hasHeaders:
os.chdir('..')
print "deleting"
shutil.rmtree(os.getcwd())
break
return feed
此处返回的Feed不是页面源,因为它从命令行返回。
答案 0 :(得分:0)
我不确定get_generated_html
应该做什么,但它不会返回任何内容,而在其中一个案例中它只会print
。我对文档字符串感到困惑,因为它说函数不应该返回任何内容,但它也没有说任何关于输出的内容。 "默认为false"部分在当前代码中不正确。
此外,communicate
返回一个元组,因此您只返回返回值的一个元素(可能)。
尝试类似:
output = utils.execute_command(command).communicate()
return output[0]
如果我们正在进行代码审查,这对我来说更优雅:
if not has_headers:
command = PAGE_SOURCE_CMD % url
else:
command = FEO_PAGE_SOURCE_CMD % url
output = utils.execute_command(command).communicate()
return output[0] # or print it?