Question

我想编写一个程序来从Twitter获取推文然后进行情绪分析。我编写了以下代码，即使在导入了所有必需的库之后也出现了错误。我对数据科学比较陌生，所以请帮助我。我无法理解这个错误的原因：

class TwitterClient(object):


def __init__(self):

    # keys and tokens from the Twitter Dev Console
    consumer_key = 'XXXXXXXXX'
    consumer_secret = 'XXXXXXXXX'
    access_token = 'XXXXXXXXX'
    access_token_secret = 'XXXXXXXXX'
    api = Api(consumer_key, consumer_secret, access_token, access_token_secret)

    def preprocess(tweet, ascii=True, ignore_rt_char=True, ignore_url=True, ignore_mention=True, ignore_hashtag=True,letter_only=True, remove_stopwords=True, min_tweet_len=3):
        sword = stopwords.words('english')

        if ascii:  # maybe remove lines with ANY non-ascii character
            for c in tweet:
                if not (0 < ord(c) < 127):
                    return ''

        tokens = tweet.lower().split()  # to lower, split
        res = []

        for token in tokens:
            if remove_stopwords and token in sword: # ignore stopword
                continue
            if ignore_rt_char and token == 'rt': # ignore 'retweet' symbol
                continue
            if ignore_url and token.startswith('https:'): # ignore url
                continue
            if ignore_mention and token.startswith('@'): # ignore mentions
                continue
            if ignore_hashtag and token.startswith('#'): # ignore hashtags
                continue
            if letter_only: # ignore digits
                if not token.isalpha():
                    continue
            elif token.isdigit(): # otherwise unify digits
                token = '<num>'

            res += token, # append token

        if min_tweet_len and len(res) < min_tweet_len: # ignore tweets few than n tokens
            return ''
        else:
            return ' '.join(res)

    for line in api.GetStreamSample():            
        if 'text' in line and line['lang'] == u'en': # step 1
            text = line['text'].encode('utf-8').replace('\n', ' ') # step 2
            p_t = preprocess(text)

    # attempt authentication
    try:
        # create OAuthHandler object
        self.auth = OAuthHandler(consumer_key, consumer_secret)
        # set access token and secret
        self.auth.set_access_token(access_token, access_token_secret)
        # create tweepy API object to fetch tweets
        self.api = tweepy.API(self.auth)
    except:
        print("Error: Authentication Failed")

假设导入了所有必需的库。错误发生在第69行。

for line in api.GetStreamSample():            
    if 'text' in line and line['lang'] == u'en': # step 1
        text = line['text'].encode('utf-8').replace('\n', ' ') # step 2
        p_t = preprocess(text)

我尝试在互联网上查看错误的原因但无法获得任何解决方案。

错误是：

requests.exceptions.ChunkedEncodingError: ('Connection broken: IncompleteRead(0 bytes read, 512 more expected)', IncompleteRead(0 bytes read, 512 more expected))

我正在使用Python 2.7并请求最新版本2.14。

Answer 1

如果在发出请求时将stream设置为True，则除非您使用所有数据或调用Response.close，否则请求无法将连接释放回池。这可能导致连接效率低下。如果您在使用stream = True时发现自己部分阅读请求正文（或根本没有读取它们），则应在with语句中发出请求以确保它始终处于关闭状态：

将requests.get（'http://httpbin.org/get'，stream = True）作为r：＃在这里做一些回应。

Answer 2

我遇到了同样的问题，但是没有流，并且正如Stone mini所说，只需在应用“ with”子句之前确保在新请求之前关闭您的请求即可。

    with requests.request("POST", url_base, json=task, headers=headers) as report:
        print('report: ', report)

Answer 3

实际上是基于 django2.7 或更早版本的应用程序的问题。 django 版本默认允许 2.5mb 数据上传请求正文的内存大小。

我在使用基于 django2.7 的应用程序时遇到了同样的问题，我刚刚更新了我的 django 应用程序的 setting.py 文件，我的 urls（端点）正在工作。

DATA_UPLOAD_MAX_MEMORY_SIZE = None

我只是在应用程序的 settings.py 文件中添加了上述变量。您也可以从 here

中读出相关信息

我很确定这对你有用。

Answer 4

对于像我这样只想避免错误并重新尝试连接的人来说，这样的事情可能会有所帮助，

r = ''
while r == '':
    try:
        r = requests.get(Url, headers = headers)
    except (requests.exceptions.ConnectionError, requests.exceptions.ChunkedEncodingError) as err:
        log_(err + ' Put to sleep before retrying.')
        time.sleep(100)
        continue

这将捕获连接错误，例如 ConnectionError 或 ChunkedEncodingError，使脚本进入睡眠状态（在本例中持续 100 秒）并重试连接。请注意，如果连接中断错误持续存在，上述脚本最终将无限尝试...您可能需要添加一个计数器来阻止它，然后再尝试太久。

requests.exceptions.ChunkedEncodingError :(＆＃39; Connection broken：IncompleteRead（0字节读取，512更多预期）＆＃39;，IncompleteRead

4 个答案: