我正在对crpytocurrency进行情绪分析。我的工作是清理csv文件中的数据。生成数据(来自Twitter)并保存在csv文件中。在做感情分析之前。我必须清理数据。例如,删除标点符号,URL,将测试放在小写字母中。这些是推特。
##我已经导入了有用的库,例如NLTK(自然语言处理),pandas,numpy等。
这是'推文'的输出。列。
ctweet['Tweets'][0:6]
Out[5]:
0 RT @TheLTCnews: The @LTCFoundation has publish...
1 RT @WildchildSings: "https:/ " + /t.co/"FZrGw6xsZU ac..."
2 RT @HODL_Whale: 5 days until #LitePay launches...
3 LTC to USD price $211.92 "https:/" + /t.co/"CFjg1mIg..."
4 LTC to BTC price B0.020218 "https:/" +/t.co/"XPL8NI..."
5 LTC to GBP price £151.89 "https:/" +/t.co/"iOIbhgyd..."
6 Litecoin dropped into the bear zone as sugges...
Name: Tweets, dtype: object
# the output contains url. Because stackoverflow won't allow me to post the url. I have to change the method for url like adding "quotes" and "//".
我的下一个任务是清理数据。这是预处理代码。
#Preprocessing del RT @blablabla:
ctweet['tweetos'] = ''
#add tweetos first part
for i in range(len(ctweet['Tweets'])):
try:
ctweet['tweetos'][i] = ctweet['Tweets'].str.split(' ')[i][0]
except AttributeError:
ctweet['tweetos'][i] = 'other'
#Preprocessing tweetos. select tweetos contains 'RT @'
for i in range(len(ctweet['Tweets'])):
if ctweet['tweetos'].str.contains('@')[i] == False:
ctweet['tweetos'][i] = 'other'
# remove URLs, RTs, and twitter handles
for i in range(len(ctweet['Tweets'])):
ctweet['Tweets'][i] = " ".join([word for word in ctweet['Tweets'][i].split()
if 'http' not in word and '@' not in word and '<' not in word])
ctweet['Tweets'][0]
上面的代码将删除标点符号,网址,将测试放在小写字母中,提取用户名以获取示例。当我运行该代码时,它会出错。
TypeErrorTraceback (most recent call last)
<ipython-input-3-8254e078073a> in <module>()
5 for i in range(len(ctweet['Tweets'])):
6 try:
----> 7 ctweet['tweetos'][i] = ctweet['Tweets'].str.split(' ')[i][0]
8 except AttributeError:
9 ctweet['tweetos'][i] = 'other'
TypeError: 'float' object has no attribute '__getitem__'
这个错误是什么意思?我怎么解决这个问题。我正在使用Jupyter Notebook 5.4.1
AttributeErrorTraceback (most recent call last)
<ipython-input-7-bb6b24f62739> in <module>()
16 # remove URLs, RTs, and twitter handles
17 for i in range(len(ctweet['Tweets'])):
---> 18 ctweet['Tweets'][i] = " ".join([word for word in ctweet['Tweets'][i].split()
19 if 'http' not in word and '@' not in word and '<' not in word])
20
AttributeError: 'float' object has no attribute 'split'
答案 0 :(得分:0)
看起来ctweet是一个字典,因此您需要指向如下索引:
ctweet['tweetos'][i] = ctweet['Tweets'][i].str.split(' ')[0]
代替:
ctweet['tweetos'][i] = ctweet['Tweets'].str.split(' ')[i][0]