Question

我在开始任务时遇到了一些麻烦。我们发布了一个标签描述的.txt文件，其中包含6列数据和大约50行此数据。我需要帮助启动列表来存储这些数据以供以后调用。最终，我需要能够列出任何特定列的所有内容并对其进行排序，计算等等。任何帮助都将不胜感激。

编辑;除了对这种东西的研究之外，我真的没有做太多，我知道不好看csv，我之前做过单列.txt文件，但我不知道如何解决这种情况。我如何为单独的列命名？当一行结束而下一行开始时，我将如何告诉程序？

Answer 1

Pandas中的数据帧结构基本上完全符合您的要求。如果您熟悉它，它与R中的数据框非常相似。它内置了用于子表格排序，排序和操作表格数据的选项。

它读取directly from csv甚至自动读取列名。你打电话：

read_csv(yourfilename, 
         sep='\t',     # makes it tab delimited
         header=1)     # makes the first row the header row.

适用于Python 3。

Answer 2

假设您有类似以下的csv。

 1       2       3       4       5       6
 1       2       3       4       5       6
 1       2       3       4       5       6
 1       2       3       4       5       6
 1       2       3       4       5       6

您可以将它们读入如下字典：

>>> import csv
>>> reader = csv.DictReader(open('test.csv','r'), fieldnames= ['col1', 'col2', 'col3', 'col4', 'col5', 'col6'],  dialect='excel-tab')
>>> for row in reader:
...     print row    
{'col6': '6', 'col4': '4', 'col5': '5', 'col2': '2', 'col3': '3', 'col1': '1'}   
{'col6': '6', 'col4': '4', 'col5': '5', 'col2': '2', 'col3': '3', 'col1': '1'}   
{'col6': '6', 'col4': '4', 'col5': '5', 'col2': '2', 'col3': '3', 'col1': '1'}   
{'col6': '6', 'col4': '4', 'col5': '5', 'col2': '2', 'col3': '3', 'col1': '1'}   
{'col6': '6', 'col4': '4', 'col5': '5', 'col2': '2', 'col3': '3', 'col1': '1'}

但是Pandas库可能更适合这个。 http://pandas.pydata.org/pandas-docs/stable/io.html#csv-text-files

Answer 3

听起来像是一个更适合数据库的工作。您应该使用类似PostgreSQL COPY FROM操作的东西将CSV数据导入表中，然后使用python + SQL进行所有排序，搜索和匹配需求。

如果您认为真正的数据库过度，那么仍然有像SQLlite和BerkleyDB这样的选项都有python模块。

编辑：BerkelyDB已被弃用，但anydbm在概念上类似。

Answer 4

我认为使用50行和6列的数据库是过度的，所以这是我的想法：

from __future__ import print_function
import os
from operator import itemgetter


def get_records_from_file(path_to_file):
    """
    Read a tab-deliminated file and return a
    list of dictionaries representing the data.
    """
    records = []
    with open(path_to_file, 'r') as f:
        # Use the first line to get names for columns
        fields = [e.lower() for e in f.readline().split('\t')]

        # Iterate over the rest of the lines and store records
        for line in f:
            record = {}
            for i, field in enumerate(line.split('\t')):
                record[fields[i]] = field
            records.append(record)

    return records


if __name__ == '__main__':
    path = os.path.join(os.getcwd(), 'so.txt')
    records = get_records_from_file(path)

    print('Number of records: {0}'.format(len(records)))

    s = sorted(records, key=itemgetter('id'))
    print('Sorted: {0}'.format(s))

为了存储记录供以后使用，请查看Python的pickle library - 这将允许您将它们保存为Python对象。

另外，请注意我现在使用的计算机上没有安装Python 3，但我很确定这可以在Python 2或3上运行。

Tab描绘了python 3 .txt文件读取

4 个答案: