使用这个简短的脚本在Python 2.7中使用VirusTotal API(此API的要点是上传要在virustotal站点中扫描的文件):
def scanAFile(fileToScan):
host = "www.virustotal.com"
selector = "https://www.virustotal.com/vtapi/v2/file/scan"
fields = [("apikey", myPublicKey)]
file_to_send = open(fileToScan, "rb").read()
files = [("file", fileToScan, file_to_send)]
json = postfile.post_multipart(host, selector, fields, files)
return simplejson.loads(json)
我发现问题是我要上传的每个文件都需要使用不同的编码,否则会收到此错误:
Traceback (most recent call last):
File "/home/user/PythonDev/20150617_WW/agent_vt.py", line 139, in <module>
scanQueue()
File "/home/user/PythonDev/20150617_WW/agent_vt.py", line 76, in scanQueue
jsonScan = scanAFile(fileToScan) #todo if file not found skip
File "/home/user/PythonDev/20150617_WW/agent_vt.py", line 37, in scanAFile
json = postfile.post_multipart(host, selector, fields, files)
File "/home/user/PythonDev/20150617_WW/postfile.py", line 13, in post_multipart
content_type, body = encode_multipart_formdata(fields, files)
File "/home/user/PythonDev/20150617_WW/postfile.py", line 45, in encode_multipart_formdata
body = CRLF.join(L)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 0: ordinal not in range(128)
文件postfile.py是在他们的网站上为virustotal提供的,这是编码问题所在的函数:
def encode_multipart_formdata(fields, files):
"""
fields is a sequence of (name, value) elements for regular form fields.
files is a sequence of (name, filename, value) elements for data to be uploaded as files
Return (content_type, body) ready for httplib.HTTP instance
"""
BOUNDARY = '----------ThIs_Is_tHe_bouNdaRY_$'
CRLF = '\r\n'
L = []
for (key, value) in fields:
L.append('--' + BOUNDARY)
L.append('Content-Disposition: form-data; name="%s"' % key)
L.append('')
L.append(value)
for (key, filename, value) in files:
L.append('--' + BOUNDARY)
L.append('Content-Disposition: form-data; name="%s"; filename="%s"' % (key, filename))
L.append('Content-Type: %s' % get_content_type(filename))
L.append('')
L.append(value)
L.append('--' + BOUNDARY + '--')
L.append('')
body = CRLF.join(L)
content_type = 'multipart/form-data; boundary=%s' % BOUNDARY
return content_type, body
作为临时解决方案,我在postfile.py:
的开头添加了这段代码import sys
reload(sys)
sys.setdefaultencoding("utf-8")
但每次更新都很烦人。有什么方法可以解决这个问题吗?
答案 0 :(得分:1)
尝试将此lib用于编码检测http://github.com/chardet/chardet
pip install chardet
然后使用它
import sys
import chardet
def scanAFile(fileToScan):
code = chardet.detect(fileToScan)
host = "www.virustotal.com"
selector = "https://www.virustotal.com/vtapi/v2/file/scan"
fields = [("apikey", myPublicKey)]
file_to_send = open(fileToScan, "rb").read().decode(code['encoding'])
files = [("file", fileToScan, file_to_send)]
json = postfile.post_multipart(host, selector, fields, files)
return simplejson.loads(json)
答案 1 :(得分:-1)
好吧,我通常使用unicode_literas
lib中的__future__
。
答案 2 :(得分:-1)
您可以在python文件的顶部添加它,然后执行。
last_login