Question

我需要创建一个使用CSV文件并返回嵌套字典的程序。外字典的键应该是第二行中的每一行中的第一个值（以便省略具有列名的行）。外部字典中每个键的值都应该是另一个字典，我将在下面解释。

内部字典的键应该是列名，而值应该是与每一行中该列对应的值。

示例：

对于这样的CSV文件：

column1, column2, column3, column4
4,12,5,11
29,47,23,41
66,1,98,78

我想以这种形式打印数据：

my_dict = {
'4': {'column1':'4','column2':'12', 'column3':'5', 'column4':'11'},
'29': {'column1':'29', 'column2':'47', 'column3':'23', 'column4':'41'},
'66': {'column1':'66', 'column2':'1', 'column3':'98', 'column4':'78'}
}

到目前为止我所获得的最接近的数字（甚至还没有接近）：

import csv
import collections

def csv_to_dict(file, delimiter, quotechar):

list_inside_dict = collections.defaultdict(list)
with open(file, newline = '') as csvfile:
    reader = csv.DictReader(csvfile, delimiter=delimiter, quotechar=quotechar)
    for row in reader:
        for (k,v) in row.items(): 
            list_inside_dict[k].append(v)
return dict(list_inside_dict)

如果我尝试使用上面的示例CSV文件，delimiter = ","和quotechar = "'"运行该函数，它将返回以下内容：

{'column1': ['4', '29', '66'], ' column2': ['12', '47', '1'], ' column3': ['5', '23', '98'], ' column4': ['11', '41', '78']}

这时我迷路了。我尝试更改：

list_inside_dict = collections.defaultdict(list)

为

list_inside_dict = collections.defaultdict(dict)

然后简单地更改每个键的值，因为我无法追加到字典中，但是所有这些都变得非常混乱。所以我从头开始，发现我到达了同一个地方。

Answer 1

您可以使用字典理解：

import csv
with open('filename.csv') as f:
  header, *data = csv.reader(f)
  final_dict = {a:dict(zip(header, [a, *b])) for a, *b in data}

输出：

{'4': {'column1': '4', ' column2': '12', ' column3': '5', ' column4': '11'}, 
 '29': {'column1': '29', ' column2': '47', ' column3': '23', ' column4': '41'}, 
 '66': {'column1': '66', ' column2': '1', ' column3': '98', ' column4': '78'}}

Answer 2

您可以将pandas用于该任务。

col2

不确定为什么要重复第一列的值，但是如果不这样做，则简化为：

library(dplyr)
library(lubridate)

df %>% 
mutate_at(vars(1:6), funs(parse_date_time(., orders=c("ymd","mdy"),quiet=TRUE))) %>%
mutate(col = coalesce(col1,col2,col3,col4,col5,col6)) %>%
  select(id, col)

#      id        col
# 1  1251 2015-04-01
# 2 16121 1999-07-01
# 3  1209 2009-01-12

Answer 3

要获得您想要的东西，需要zips。

我们可以使用字符串代替csv，而不是文件。只需用文件替换该部分。

给出：

s='''\
column1, column2, column3, column4
4,12,5,11
29,47,23,41
66,1,98,78'''

您可以这样做：

import csv 

data=[]
for row in csv.reader(s.splitlines()):  # replace 'splitlines' with your file
    data.append(row)

header=data.pop(0)
col1=[e[0] for e in data]
di={}
for c,row in zip(col1,data):
    di[c]=dict(zip(header, row))

然后：

>>> di
{'4': {'column1': '4', ' column2': '12', ' column3': '5', ' column4': '11'}, 
 '29': {'column1': '29', ' column2': '47', ' column3': '23', ' column4': '41'}, 
 '66': {'column1': '66', ' column2': '1', ' column3': '98', ' column4': '78'}}

在Python 3.6+上，字典将保持插入顺序。早期的Python不会。

Answer 4

这类似于this answer，但是，我认为可以更好地解释它。

import csv

with open('filename.csv') as f:
    headers, *data = csv.reader(f)
    output = {}
    for firstInRow, *restOfRow in data:
        output[firstInRow] = dict(zip(headers, [firstInRow, *restOfRow]))
    print(output)

这是在文件中的数据行之间循环，第一个值作为索引，而后一个值在列表中。然后，通过压缩标题列表和值列表来设置输出字典中的索引值。 output[first] = ...行与编写output[firstInRow] = {header[1]: firstInRow, header[2]: restOfRow[1], ...}相同。

输出：

{'4': {'column1': '4', ' column2': '12', ' column3': '5', ' column4': '11'}, 
'29': {'column1': '29', ' column2': '47', ' column3': '23', ' column4': '41'}, 
'66': {'column1': '66', ' column2': '1', ' column3': '98', ' column4': '78'}}

嵌套字典问题

4 个答案: