Question

我有一个显示多个字符串并预测其所属类别的代码...我想知道而不是写下该字符串如何读取csv文件并预测每一行的类别。

我有一个名为以下文件的csv文件：test.csv

texts = ["I requested a home loan modification through Bank of America. Bank of America never got back to me.",
     "It has been difficult for me to find my past due balance. I missed a regular monthly payment",
     "I can't get the money out of the country.",
     "I have no money to pay my tuition",
     "Coinbase closed my account for no reason and furthermore    refused to give me a reason despite dozens of request"]
text_features = tfidf.transform(texts)
predictions = model.predict(text_features)
for text, predicted in zip(texts, predictions):
  print('"{}"'.format(text))
  print("  - Predicted as: '{}'".format(id_to_category[predicted]))
  print("")

Answer 1

import csv
with open('path_to_fiile') as f:
    csv_reader = csv.reader(f , delimiter = ',')
    '''the stuff you want to do'''

类似的东西必须对您有用。

Answer 2

如果分隔行中只有字符串，而句子中没有"\n"

texts = open("test.csv").read().split('\n')

如果您的CSV包含很多列（或句子中有"\n"），则可以使用模块csv或pandas来阅读

import pandas as pd

df = pd.read_csv("test.csv")

# get first column and conver to python's list
texts = df[0].to_list()

也许您甚至可以直接使用df

text_features = tfidf.transform( df[0] )

编辑：如果您有两列-文本和类别-那么可能是

import pandas as pd

df = pd.read_csv('test.csv')

X_test = df[0] # first column - text
y_test = df[1] # second column - category

text_features = tfidf.transform(X_test)
predictions = model.predict(text_features)

# ---

for text, predicted_category, expected_category in zip(X_test, predictions, y_test):

    print('"{}"'.format(text))
    print("  - Predicted as: '{}'".format(id_to_category[predicted_category]))

    is_correct = (predicted_category == expected_category) # True or False

    print('  - Correct prediction:', is_correct)
    print()

# ---

from sklearn.metrics import accuracy_score

print('Accuracy:', accuracy_score(y_test, predictions))

如何导入文件以预测情绪

2 个答案: