Question

我有一个接受四种类型文件的函数：

.csv
.xls
.xlsx
.json

这些文件最多可包含数千行数据。我想获取文件的列标题，以及N示例数据行（不需要是第一个N，但需要总共N个）。一个例子：

来自animals.csv：

要：

animals = {
    "dogs": [1, 5, 8],
    "cats": [2, 6, 9],
    "birds": [3, 10, 14],
    "frogs": [4, 8, 11]
}

最有效的方法是什么？

测试文件：

Answer 1

这是我的解决方案。我并不认为它是“最有效”的。您需要安装xlrd（pip3 install xlrd）。测试文件可在问题描述中找到。

import collections
import pathlib
import csv
import json
import xlrd

file = "animals.csv"

f_sufx = pathlib.Path(file).suffix
if f_sufx == ".csv":
    with open(file, 'r') as f:
        reader = csv.DictReader(f)
        d = collections.defaultdict(set)
        for r in reader:
            for k, v in r.items():
                if len(d[k]) < 3 and v:
                    d[k].add(v)
elif f_sufx == ".json":
    with open(file, 'r') as f:
        d = collections.defaultdict(set)
        for r in json.load(f):
            for k, v in r.items():
                if len(d[k]) < 3 and v:
                    d[k].add(v)
elif f_sufx in [".xls", ".xlsx"]:
    d = collections.defaultdict(set)
    sh = xlrd.open_workbook(file).sheet_by_index(0)  
    for row in range(2, sh.nrows):
        for col in range(sh.ncols):
            if len(d[sh.cell_value(1, col)]) < 3 and sh.cell_value(row, col):
                d[sh.cell_value(1, col)].add(sh.cell_value(row, col))

print(d)

读取标题和N行数据

1 个答案: