排除python中的非ascii字符

时间:2016-03-10 11:10:27

标签: python non-ascii-characters

我有一个脚本,它使用字典来解密加密的消息,问题是解密过程会产生很多垃圾(a.k.a non-ascii)字符。这是我的代码:

       <section class="video-bg">
        <div class="embed-responsive embed-responsive-16by9">
          <figure class="overlay">
            <video autoplay class="embed-responsive-item" poster="videos/cover.jpg">
              <source src="videos/landing_page_video_vimeo5_converted.mp4" type="video/mp4">
              <source src="videos/landing page video vimeo5.ogg" type="video/ogg">
              <source src="videos/landing page video vimeo5.ogg" type="video/webm">
            </video>
            <figcaption>
              <h1>text</h1>
              <p>text</p>
              <p>text</p>
              <p>text</p>
              <a class="video-play" data-toggle="modal" data-target="#myModal"></a>
            </figcaption>
          </figure>
        </div>
      </section>

到目前为止我所想到的是将from Crypto.Cipher import AES import base64 import os BLOCK_SIZE = 32 PADDING = '{' # Encrypted text to decrypt encrypted = "WI4wBGwWWNcxEovAe3p+GrpK1GRRQcwckVXypYlvdHs=" DecodeAES = lambda c, e: c.decrypt(base64.b64decode(e)).rstrip(PADDING) adib = open('words.txt') for line in adib.readlines(): secret = line.rstrip('\n') if (secret[-1:] == "\n"): print "Error, new line character at the end of the string. This will not match!" elif (len(secret) >= 32): print "Error, string too long. Must be less than 32 characters." else: # create a cipher object using the secret cipher = AES.new(secret + (BLOCK_SIZE - len(secret) % BLOCK_SIZE) * PADDING) # decode the encoded string decoded = DecodeAES(cipher, encrypted) print decoded+"\n" 字符串转换为Ascii然后排除非ascii字符,但它不起作用。

3 个答案:

答案 0 :(得分:1)

您可以删除非ascii字符,如下所示:   编辑:首先使用解码进行更新。

output = 'string with some non-ascii characters��@$���9�HK��F�23 some more char'
output = output.decode('utf-8').encode('ascii', 'ignore')

答案 1 :(得分:0)

此版本可以使用:

#!/usr/bin/env python
# -*- coding: UTF-8 -*-

def evaluate_string_is_ascii(mystring):
    is_full_ascii=True
    for c in mystring:
        try:
            if ord(c)>0 and ord(c)<=127:
                #print c,"strict ascii =KEEP"
                pass
            elif ord(c)>127  and ord(c)<=255:
                #print c,"extended ascii code =TRASH"
                is_full_ascii=False
                break
            else:
               # print c,"no ascii  =TRASH"
                is_full_ascii=False
                break
        except:
            #print c,"no ascii  =TRASH"
            is_full_ascii=False
            break
    return is_full_ascii


my_text_content="""azertwxcv
123456789
456dqsdq13
o@��nS��?t#�
lkjal�
kfldjkjl&é"""

for line in my_text_content.split('\n'):

    #check if line contain only ascii
    if evaluate_string_is_ascii(line)==True:

        #print the line
        print line

答案 2 :(得分:0)

if six.PY2:
    if isinstance(input_data, str):
        input_data = input_data.decode('ascii', 'ignore').encode('ascii')
    else:
        input_data = input_data.encode('ascii', 'ignore')
else:
    six.PY3
    input_data = str(input_data)

print(input_data)