Question

我正在使用selenium webdriver获取页面源代码。但是我找回了一个充满了/ a0：的源代码，我已经阅读了非破坏性的空间。所以我想知道：

一个。如何阅读它，我得到它后应该清理源，还是我可以提前做任何事情？

B中。首先，我第一次遇到这样的事情时，有什么理由把它放在HTML上。

代码示例：

......<a0:div style="position: absolute; top: -1000px; height: 1px; width: 1px;">
<a0:object data="https://translate.googleapis.com/translate_static/js/element/hrs.swf" height="500"
id="fI0hpn482ja" name="fI0hpn482ja" type="application/x-shockwave-flash" width="400">
<a0:param name="allowScriptAccess" value="always"></a0:param></a0:object></a0:div>
<a0:iframe class="goog-te-menu-frame skiptranslate" frameborder="0" style="visibility:
visible; -moz-box-sizing: content-box; width: 731px; height: 274px; display: none;">
</a0:iframe></a0:body></a0:html></body></html>

谢谢：）

Answer 1

1.你可以用空字符串重新制作它们。常见用法可能是这样的：

def get_clean_string(string, substring):
    while substring in string:
        string = string.replace(substring, '')
    return string

结果：

In [24]: get_clean_string('replacemeHeresWhatINeed', 'replaceme')
Out[24]: 'HeresWhatINeed'

2.也许你应该在你的来源中指定编码。 Python默认使用ASCII（here）。在我的项目中，我一直遇到俄语字符，所以我的所有文件都在第一行编码为utf-8

#-*- coding: utf-8 -*-

Python，使用Selenium如何从char中清除page_source，如/ a0：

1 个答案: