我有一个.CSV文件,其中有两列用于Tweet,另一列用于情感值,格式如此(但对于数千条推文):
I like stackoverflow,Positive
Thanks for your answers,Positive
I hate sugar,Negative
I do not like that movie,Negative
stackoverflow is a question and answer site,Neutral
Python is oop high-level programming language,Neutral
我想得到这样的输出:
negfeats = [('I do not like that movie','Negative'),('I hate sugar','Negative')]
posfeats = [('I like stackoverflow','Positive'),('Thanks for your answers','Positive')]
neufeats = [('stackoverflow is a question and answer site','Neutral'),('Python is oop high-level programming language','Neutral')]
我在下面尝试过这样做但是我在元组中有一些缺少的字符。另外,如何将x,y和z保持为整数而不是浮点数?
import csv
neg = ['Negative']
pos = ['Positive']
neu = ['Neutral']
neg_counter=0
pos_counter=0
neu_counter=0
negfeats = []
posfeats = []
neufeats = []
with open('ff_tweets.csv', 'Ur') as f:
for k in f:
if any(word in k for word in neg):
negfeats = list(tuple(rec) for rec in csv.reader(f, delimiter=','))
neg_counter+=1
elif any(word in k for word in pos):
posfeats = list(tuple(rec) for rec in csv.reader(f, delimiter=','))
pos_counter+=1
else:
neufeats = list(tuple(rec) for rec in csv.reader(f, delimiter=','))
neu_counter+=1
x = neg_counter * 3/4
y = pos_counter * 3/4
z = neu_counte * 3/4
print negfeats
print posfeats
print neufeats
print x
print y
print z
答案 0 :(得分:0)
这应该有效
import csv
neg = 'Negative'
pos = 'Positive'
neu = 'Neutral'
negfeats = []
posfeats = []
neufeats = []
with open('ff_tweets.csv', 'Ur') as f:
for r in csv.reader(f):
if r[1] == neg:
negfeats.append((r[0], r[1]))
if r[1] == pos:
posfeats.append((r[0], r[1]))
if r[1] == neu:
neufeats.append((r[0], r[1]))
x = len(negfeats) * float(3)/4
y = len(posfeats) * float(3)/4
z = len(neufeats) * float(3)/4
print negfeats
print posfeats
print neufeats
print x
print y
print z
答案 1 :(得分:0)
尝试使用Pandas。 'Sentiment'是csv文件中的一列:
import pandas as pd
df = pd.read_csv('ff_tweets.csv')
pos = tuple(df.loc[df['Sentiment'] == 'Positive'].apply(tuple, axis = 1))
neu = tuple(df.loc[df['Sentiment'] == 'Neutral'].apply(tuple, axis = 1))
neg = tuple(df.loc[df['Sentiment'] == 'Negative'].apply(tuple, axis = 1))
print pos, neg, neu
输出:
(('I like stackoverflow', 'Positive'), ('Thanks for your answers', 'Positive')) (('I hate sugar', 'Negative'), ('I do not like that movie', 'Negative')) (('stackoverflow is a question and answer site', 'Neutral'), ('Python is oop high-level programming language', 'Neutral'))