UnicodeEncodeError:' ascii'编解码器不能对字符u' \ xea'进行编码。位置39:序数不在范围内(128)

时间:2014-11-15 19:46:45

标签: python encoding praw

我是Python的新手,我现在已经尝试修复它两个小时了。

以下是代码:

import praw
import json
import requests
import tweepy
import time

access_token = 'REDACTED'
access_token_secret = 'REDACTED'
consumer_key = 'REDACTED'
consumer_secret = 'REDACTED'

def strip_title(title):
    if len(title) < 94:
        return title
    else:
        return title[:93] + "..."

def tweet_creator(subreddit_info):
    post_dict = {}
    post_ids = []
    print "[bot] Getting posts from Reddit"
    for submission in subreddit_info.get_hot(limit=20):
        post_dict[strip_title(submission.title)] = submission.url
        post_ids.append(submission.id)
    print "[bot] Generating short link using goo.gl"
    mini_post_dict = {}
    for post in post_dict:
        post_title = post
        post_link = post_dict[post]         
        short_link = shorten(post_link)
        mini_post_dict[post_title] = short_link 
    return mini_post_dict, post_ids

def setup_connection_reddit(subreddit):
    print "[bot] setting up connection with Reddit"
    r = praw.Reddit('yasoob_python reddit twitter bot '
                'monitoring %s' %(subreddit)) 
    subreddit = r.get_subreddit(subreddit)
    return subreddit

def shorten(url):
    headers = {'content-type': 'application/json'}
    payload = {"longUrl": url}
    url = "https://www.googleapis.com/urlshortener/v1/url"
    r = requests.post(url, data=json.dumps(payload), headers=headers)
    link = json.loads(r.text)['id']
    return link

def duplicate_check(id):
    found = 0
    with open('posted_posts.txt', 'r') as file:
        for line in file:
            if id in line:
                found = 1
    return found

def add_id_to_file(id):
    with open('posted_posts.txt', 'a') as file:
        file.write(str(id) + "\n")

def main():
    subreddit = setup_connection_reddit(‘python’)
    post_dict, post_ids = tweet_creator(subreddit)
    tweeter(post_dict, post_ids)

def tweeter(post_dict, post_ids):
    auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
    auth.set_access_token(access_token, access_token_secret)
    api = tweepy.API(auth)
    for post, post_id in zip(post_dict, post_ids):
        found = duplicate_check(post_id)
        if found == 0:
            print "[bot] Posting this link on twitter"
            print post+" "+post_dict[post]+" #python"
            api.update_status(post+" "+post_dict[post]+" #python")
            add_id_to_file(post_id)
            time.sleep(30)
        else:
            print "[bot] Already posted" 

if __name__ == '__main__':
    main()

回溯:

root@li732-134:~# python twitter.py
[bot] setting up connection with Reddit
[bot] Getting posts from Reddit
[bot] Generating short link using goo.gl
[bot] Already posted
[bot] Already posted
[bot] Already posted
[bot] Posting this link on twitter
Traceback (most recent call last):
File "twitter.py", line 82, in <module>
main()
File "twitter.py", line 64, in main
tweeter(post_dict, post_ids)
File "twitter.py", line 74, in tweeter
print post+" "+post_dict[post]+" #python"
UnicodeEncodeError: 'ascii' codec can't encode character u'\xea' in position 39:       
ordinal not in range(128)`

我真的不知道该怎么做。有人能指出我正确的方向吗?

编辑:添加了代码和回溯。

2 个答案:

答案 0 :(得分:1)

即使您致电decode(),您接收的字节也必须采用预期的正确编码形式。

如果在UTF-8字符串中遇到\xea,则必须后跟两个字节,而不是任何字节,它们必须在有效范围内。否则,它无效UTF-8。

E.g。这里有两个Unicode代码点。第一个代码点U+56只占用一个字节。下一个U+a000需要三个字节,我们知道的方式是因为遇到\xea

http://hexutf8.com/?q=0x560xea0x800x80

简单地删除上面的最后一个延续字节,这不再是有效的UTF-8:

http://hexutf8.com/?q=0x560xea0x80

我不知道你发布了哪些你失败的价值,但我要仔细检查一下,确保你真正获得有效的UTF- 8个数据。

答案 1 :(得分:0)

错误发生在这里:

print post+" "+post_dict[post]+" #python"

问题似乎是你在这一行中连接ASCII字符串和Unicode字符串。这引起了一个问题。尝试仅连接Unicode字符串:

print post + u" " + post_dict[post] + u" #python"

如果您仍然遇到问题,请查看type(post)type(post_dict[post])的输出,这两个输出都应该是Unicode字符串。如果它们中的任何一个不是那么你需要使用正确的编码将它们转换为Unicode字符串(很可能是UTF-8)。这可以按如下方式完成:

post.decode('UTF-8')

或:

post_dict[post].decode('UTF-8')

上面会在Python 2中将字符串转换为Unicode字符串。完成后,您可以安全地将Unicode字符串连接在一起。 Python 2中的关键是永远不要将常规字符串与Unicode字符串混合,否则会导致问题。