我正在尝试删除文本中的空行,但在使用此代码时here:
import io
with open("outprint6.csv", "r") as f:
for line in f:
cleanedLine = line.strip()
if cleanedLine: # is not empty
print(cleanedLine)
f = io.open('eliminado', 'a')
f.write(unicode(cleanedLine, 'ascii'))
f.write(u'\n')
f.close()
我收到了这个错误:
'utf8' codec can't decode byte 0xfa in position 21: invalid start byte.
我该如何解决?我在这里找到了一些答案,但在这种情况下无效。 (我在编程方面真的很新......)
它解决了空行的问题,但我无法将已处理的文本写入新的csv文件。 该文本以西班牙文撰写。我看到写这些字母时出现的错误(í,ó等)
我使用以下代码检索了Twitter数据:
import tweepy
import json
import io
# Authentication details. To obtain these visit dev.twitter.com
consumer_key = ''
consumer_secret = ''
access_token = ''
access_token_secret = ''
# This is the listener, responsible for receiving data
class StdOutListener(tweepy.StreamListener):
def on_data(self, data):
print '1'
# Twitter returns data in JSON format - we need to decode it first
decoded = json.loads(data)
if not decoded['text'].startswith('RT'):
try:
# Also, we convert UTF-8 to ASCII ignoring all bad characters sent by users
tweet = '@%s; %s; %s; %s; %s; %s; %s; %s; %s; %s; ""[%s]""; %s' % (decoded['user']['id'], decoded['user']['location'], decoded['user']['followers_count'], decoded['user']['created_at'], decoded['user']['utc_offset'], decoded['user']['time_zone'], decoded['coordinates'], decoded['place'], decoded['id'], decoded['created_at'], decoded['text'].encode('ascii', 'ignore'), decoded['retweet_count'])
print tweet
f = io.open('outprint6.csv', 'a')
f.write(tweet)
f.write(u'\n')
f.close()
except:
pass
def on_error(self, status):
print status
# if __name__ == '__main__':
l = StdOutListener()
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
print "Showing all new tweets for"
#There are different kinds of streams: public stream, user stream, multi-user streams
# In this example follow #programming tag
# For more details refer to https://dev.twitter.com/docs/streaming-apis
stream = tweepy.Stream(auth, l)
stream.filter(locations=[-81.397882,-4.972829,-75.288231,0.762316])
对于“文本字段”中的文本,编码为'ascii',但在使用它写入新的csv文件时,我遇到了问题......