Question

我正在尝试使用read_csv()读取一个csv文件，我想返回一个可迭代的列表，该列表可用于其他功能。最终目标是从此文件中读取列并对其进行预处理，以供在Weka中使用。

我很难理解如何通过第一步，因此我实际上可以开始编写用于特征提取的功能。我知道答案可能很简单，但我似乎无法超越这一点。

尝试使用yield和generator，它仅返回csvfile的第一行。返回仅返回csvfile的第一行。

import csv


    def read_csv():
        with open('spam.csv', newline='', encoding='latin-1') as csvfile:
            spamreader = csv.reader(csvfile, delimiter=',', quotechar='"')
            spamreader = list(spamreader)
            return spamreader


    def file_sort(spamreader):
        for row in spamreader:
            message = []
            stop_words = set(["the", "of", "a", "to", "be", "from", "or", ",", "'", "its", "is", "Is", "The", "To", "Its", "it's", "It's", "."])
            string = "".join(row[1])
            word_string = string.split()
            for word in stop_words:
                try:
                    while True:
                        word_string.remove(word)
                except ValueError:
                    pass
            for word in word_string:
                message.append(word)
        yield message


    def main():
        spamreader = read_csv()
        for message in file_sort(spamreader):
            print(message)
main()

Answer 1

尝试熊猫。

df = pandas.read_csv("filename.csv")。

它将为您提供一个可以使用的数据框。

Answer 2

重要的是要意识到csv.reader会将文件解析为读取的文件。 Python将打开文件，进行遍历，然后关闭文件。您代码中的对象spamreader用于存储读取CSV文件但未真正创建有用的CSV数据结构的事件。

我认为，最简单的解决方案是解决您的问题，即在读取项目时将CSV文件中的项目转换为列表。然后，Python将构建您要查找的列表的列表。将代码的第二行和第三行合并为一条语句：

with open('spam.csv', newline='', encoding='latin-1') as csvfile:
            spamreader = list(csv.reader(csvfile, delimiter=','))

这将返回从CSV文件构建的字符串列表（无论内部数据类型如何）。如果希望它返回数字，则需要使用其他参数。

如何正确读取csv文件并返回数据以用于其他功能

2 个答案: