我的输入样本看起来像这样,太简单了解一下。这是用户评级的矩阵,其中列是产品编号。
User 95 94 97 101 99 87 98 86 103 105 106 100 92 89 91 96
27669 15 19 2 1
27670 12 9 61 51 69 30 32 30 10
27671 49 7 29
27672 11 73 43 47 12 6
27673 8 14 11 11
27674 1 55
27675 9 9 10 30 29 11
27676 29 50 50
27677 31 25 28
27678 9 9 27 7 49 7
27679 28 27 7
27680 52 47 40 55 52
27681 11 9 15
27682 28 50 27 49
27683 9 9 10 8 12 9 10 8
我使用下面的代码阅读csv并将其转换为字典
import csv
reader= csv.DictReader(open('test_23.csv'))
next(reader)
users = {}
for row in reader:
key=row.pop('User')
if key in users:
pass
users[key]=row
print users
我的代码输出示例如下
{'31550': {'91': '', '88': '', '89': '', '97': '', '103': '', '100': '', '86': '', '87': '', '101': '', '95': '', '105': '', '99': '', '98': '', '102': '69', '90': '', '93': '', '92': '', '106': '', '94': '', '104': '', '96': ''}, '29443': {'91': '90', '88': '', '89': '69', '97': '', '103': '65', '100': '', '86': '', '87': '74', '101': '', '95': '', '105': '68', '99': '', '98': '', '102': '', '90': '', '93': '', '92': '', '106': '70', '94': '', '104': '74', '96': ''}, '32103': {'91': '', '88': '', '89': '', '97': '', '103': '', '100': '3', '86': '', '87': '', '101': '28', '95': '', '105': '65', '99': '', '98': '4', '102': '', '90': '', '93': '', '92': '', '106': '69', '94': '', '104': '68', '96': ''}, '29687': {'91': '', '88': '9', '89': '7', '97': '', '103': '8', '100': '', '86': '', '87': '', '101': '', '95': '', '105': '', '99': '', '98': '', '102': '9', '90': '25', '93': '', '92': '', '106': '27', '94': '', '104': '9', '96': ''}, '29444': {'91': '69', '88': '60', '89': '71', '97': '', '103': '', '100': '', '86': '51', '87': '', '101': '', '95': '19', '105': '', '99': '', '98': '', '102': '', '90': '', '93': '', '92': '', '106': '', '94': '35', '104': '', '96': '18'}, '28224': {'91': '', '88': '18', '89': '18', '97': '', '103': '', '100': '', '86': '', '87': '17', '101': '', '95': '17', '105': '', '99': '', '98': '', '102': '', '90': '17', '93': '', '92': '19', '106': '', '94': '19', '104': '', '96': '17'}, '31783': {'91': '', '88': '', '89': '', '97': '', '103': '50', '100': '', '86': '', '87': '', '101': '47', '95': '55', '105': '', '99': '', '98': '54', '102': '', '90': '', '93': '', '92': '', '106': '', '94': '', '104': '53', '96': ''},
有没有办法让Dictreader可以跳过空白值?
另外,有没有办法删除所有值的单引号?
我试过引用= csv.QUOTE_NONE但这不起作用。
答案 0 :(得分:1)
您可以使用defaultdict
并一步完成所有操作。
#!/usr/bin/python
from csv import DictReader
from collections import defaultdict
users = defaultdict(dict)
for row in DictReader(open('./file.csv', 'rb')):
key = row.pop('User')
tmp_dict = {int(k):int(v) for k,v in row.iteritems() if v != ''}
users[key] = tmp_dict
>>> users
{'27671':{89:7, 91:29, 92:49}, ... }
答案 1 :(得分:0)
单引号是python的东西,所以你必须先转换为字符串,然后自己删除它们:
str s = str(users).replace("'","")
我不知道一种默认方式告诉CSV阅读器不要读取空格,但你可以循环遍历字典并删除所有空白的值:
users = {k: v for k,v in users.iteritems() if v != ""}
答案 2 :(得分:0)
import csv
reader= csv.DictReader(open('test_23.csv'))
users = {}
newlist = {}
for row in reader:
key =row.pop('User')
if key in users:
pass
for val in row:
if row[val] != '':
newlist[val] = int(row[val])
else:
pass
users[key]=newlist