我正在尝试删除RT
之类的常见邮件,所有以@
开头的字符串以及所有网址。我对待它的方式是这样的:
prefixes=["http","ftp","@","#","RT"]
for prefix in prefixes:
for word in final_tweet:
if word.startswith(prefix):
print "starts with prefix"
word=''
虽然此代码有时会删除junks(并始终检测垃圾),但并不总是删除它们。所以我想知道问题是什么?
以下是输出的一些示例:
['RT', '@NadelParis:', 'Going2LOVEorKILL?Download', 'NOW!', 'https://t.co/xilNh66e34', '@CrookedIntriago', '@Seven13music', '@UMG', '\xe3\x82\x8f\xe3\x81\x9f\xe3\x81\x97\xe3\x81\xaf\xe3\x80\x81\xe3\x81\x82\xe3\x81\xaa\xe3\x81\x9f\xe3\x82\x92\xe6\x84\x9b\xe3\x81\x97\xe3\x81\xa6\xe3\x81\x84\xe3\x81\xbe\xe3\x81\x99!', 'RTPlz<3', 'https:/\xe2\x80\xa6']
starts with prefix
starts with prefix
starts with prefix
starts with prefix
starts with prefix
starts with prefix
starts with prefix
starts with prefix
['Going2LOVEorKILL?Download', 'NOW!', 'https://t.co/xilNh66e34', '@CrookedIntriago', '@Seven13music', '@UMG', '\xe3\x82\x8f\xe3\x81\x9f\xe3\x81\x97\xe3\x81\xaf\xe3\x80\x81\xe3\x81\x82\xe3\x81\xaa\xe3\x81\x9f\xe3\x82\x92\xe6\x84\x9b\xe3\x81\x97\xe3\x81\xa6\xe3\x81\x84\xe3\x81\xbe\xe3\x81\x99!', 'RTPlz<3', 'https://t.co/I40s8x3QAV']
['RT', '@dbrandSkins:', 'Dear', 'Apple,', 'T9', 'dialing', 'optional.', 'Get', 'shit', 'together.', 'Signed,\nEveryone']
starts with prefix
starts with prefix
['Dear', 'Apple,', 'T9', 'dialing', 'optional.', 'Get', 'shit', 'together.', 'Signed,\nEveryone']
['RT', '@WeLoveRobDyrdek:', 'This', 'dog', '', 'https://t.co/5N86jYipOI']
null found
starts with prefix
starts with prefix
starts with prefix
['This', 'dog', '', 'https://t.co/5N86jYipOI']
null found
starts with prefix
['RT', '@sayingsforgirls:', 'Do', 'touch', 'MY', 'iPhone.', "It's", 'usPhone,', 'wePhone,', 'ourPhone,']
starts with prefix
starts with prefix
['Do', 'touch', 'MY', 'iPhone.', "It's", 'usPhone,', 'wePhone,', 'ourPhone,']
['RT', '@BrianaaSymonee:', 'says', 'imma', 'dog,', 'takes', 'one', 'know', 'one...']
starts with prefix
starts with prefix
['says', 'imma', 'dog,', 'takes', 'one', 'know', 'one...']
答案 0 :(得分:1)
您可以检查每个前缀
>>> for prefix in prefixes:
... final_tweet = [ w for w in final_tweet if not w.startswith(prefix)]
答案 1 :(得分:0)
#Python IRC频道的某人给出的答案:
final_tweet = [word for word in final_tweet if not any (word.startswith(prefix) for prefix in prefixes)]