在csv.DictReader中跳过空白列

时间:2015-08-05 02:18:39

标签: python

我的输入样本看起来像这样,太简单了解一下。这是用户评级的矩阵,其中列是产品编号。

User    95  94  97  101 99  87  98  86  103 105 106 100 92  89  91  96
27669   15  19      2   1                                           
27670               12  9   61  51  69  30  32  30  10              
27671                                                   49  7   29  
27672   11  73                      43  47  12                      6
27673           8       14      11                  11              
27674                   1   55                                      
27675                       9           9       10      30  29  11  
27676                               29  50  50                      
27677           31      25          28                              
27678           9   9       27  7   49              7               
27679       28                      27                              7
27680   52  47              40      55      52                      
27681                               11                  9       15  
27682   28  50              27                                      49
27683           9   9   10      8   12              9   10  8       

我使用下面的代码阅读csv并将其转换为字典

import csv
reader= csv.DictReader(open('test_23.csv'))
next(reader)
users = {}
for row in reader:
    key=row.pop('User')
    if key in users:
        pass
    users[key]=row
print users        

我的代码输出示例如下

{'31550': {'91': '', '88': '', '89': '', '97': '', '103': '', '100': '', '86': '', '87': '', '101': '', '95': '', '105': '', '99': '', '98': '', '102': '69', '90': '', '93': '', '92': '', '106': '', '94': '', '104': '', '96': ''}, '29443': {'91': '90', '88': '', '89': '69', '97': '', '103': '65', '100': '', '86': '', '87': '74', '101': '', '95': '', '105': '68', '99': '', '98': '', '102': '', '90': '', '93': '', '92': '', '106': '70', '94': '', '104': '74', '96': ''}, '32103': {'91': '', '88': '', '89': '', '97': '', '103': '', '100': '3', '86': '', '87': '', '101': '28', '95': '', '105': '65', '99': '', '98': '4', '102': '', '90': '', '93': '', '92': '', '106': '69', '94': '', '104': '68', '96': ''}, '29687': {'91': '', '88': '9', '89': '7', '97': '', '103': '8', '100': '', '86': '', '87': '', '101': '', '95': '', '105': '', '99': '', '98': '', '102': '9', '90': '25', '93': '', '92': '', '106': '27', '94': '', '104': '9', '96': ''}, '29444': {'91': '69', '88': '60', '89': '71', '97': '', '103': '', '100': '', '86': '51', '87': '', '101': '', '95': '19', '105': '', '99': '', '98': '', '102': '', '90': '', '93': '', '92': '', '106': '', '94': '35', '104': '', '96': '18'}, '28224': {'91': '', '88': '18', '89': '18', '97': '', '103': '', '100': '', '86': '', '87': '17', '101': '', '95': '17', '105': '', '99': '', '98': '', '102': '', '90': '17', '93': '', '92': '19', '106': '', '94': '19', '104': '', '96': '17'}, '31783': {'91': '', '88': '', '89': '', '97': '', '103': '50', '100': '', '86': '', '87': '', '101': '47', '95': '55', '105': '', '99': '', '98': '54', '102': '', '90': '', '93': '', '92': '', '106': '', '94': '', '104': '53', '96': ''},

有没有办法让Dictreader可以跳过空白值?

另外,有没有办法删除所有值的单引号?

我试过引用= csv.QUOTE_NONE但这不起作用。

3 个答案:

答案 0 :(得分:1)

您可以使用defaultdict并一步完成所有操作。

#!/usr/bin/python

from csv import DictReader
from collections import defaultdict

users = defaultdict(dict)

for row in DictReader(open('./file.csv', 'rb')):
    key = row.pop('User')
    tmp_dict = {int(k):int(v) for k,v in row.iteritems() if v != ''}
    users[key] = tmp_dict

>>> users
{'27671':{89:7, 91:29, 92:49}, ... }

答案 1 :(得分:0)

单引号是python的东西,所以你必须先转换为字符串,然后自己删除它们:

str s = str(users).replace("'","")

我不知道一种默认方式告诉CSV阅读器不要读取空格,但你可以循环遍历字典并删除所有空白的值:

users = {k: v for k,v in users.iteritems() if v != ""}

答案 2 :(得分:0)

import csv
reader= csv.DictReader(open('test_23.csv'))
users = {}
newlist = {}

for row in reader:

    key =row.pop('User')
    if key in users:
        pass      


    for val in row:
        if row[val] != '':
            newlist[val] = int(row[val])
        else:
            pass

    users[key]=newlist