如何使用python

时间:2016-03-02 17:41:00

标签: python loops dictionary

我目前正面临将我的cvs数据制作成字典的问题。

我想在文件中使用3列:

userID, placeID, rating
U1000,  12222,   3
U1000,  13333,   2
U1001,  13333,   4

我想让结果看起来像这样:

{'U1000': {'12222': 3, '13333': 2}, 
'U1001': {'13333': 4}}

也就是说, 我想让我的数据结构看起来像:

sample = {}
sample["U1000"] = {}
sample["U1001"] = {}
sample["U1000"]["12222"] = 3
sample["U1000"]["13333"] = 2
sample["U1001"]["13333"] = 4

但是我有很多要处理的数据。 我想用循环得到结果,但我已经尝试了2个小时而且失败了..

---以下代码可能会让您感到困惑---

我的结果现在看起来像这样:

{'U1000': ['12222', 3],  
'U1001': ['13333', 4]}
  1. dict的值是列表而不是字典
  2. 用户“U1000”多次出现,但在我的结果中只出现一次
  3. 我认为我的代码有很多错误。如果你不介意请看看:

    reader = np.array(pd.read_csv("rating_final.csv"))
    included_cols = [0, 1, 2]
    
    sample= {}
    target=[]
    target1 =[]
    for row in reader:
            content = list(row[i] for i in included_cols)
            target.append(content[0])
            target1.append(content[1:3])
    
    sample = dict(zip(target, target1))
    

    我该如何改进代码? 我查看了stackoverflow,但由于个人缺乏能力, 有谁可以请帮助我这个?

    非常感谢!!

2 个答案:

答案 0 :(得分:2)

这应该做你想要的:

import collections

reader = ...
sample = collections.defaultdict(dict)

for user_id, place_id, rating in reader:
    rating = int(rating)
    sample[user_id][place_id] = rating

print(sample)
# -> {'U1000': {'12222': 3, '1333': 2}, 'U1001': {'13333': 4}}

defaultdict是一个便捷实用程序,每当您尝试访问不在字典中的键时,它都会提供默认值。如果您不喜欢它(例如,因为您希望sample['non-existent-user-id]KeyError而失败),请使用以下命令:

reader = ...
sample = {}

for user_id, place_id, rating in reader:
    rating = int(rating)
    if user_id not in sample:
        sample[user_id] = {}
    sample[user_id][place_id] = rating

答案 1 :(得分:1)

示例中的预期输出是不可能的,因为{'1333': 2}不会与密钥相关联。不过,您可以获得{'U1000': {'12222': 3, '1333': 2}, 'U1001': {'13333': 4}} dict dict s:

sample = {}
for row in reader:
    userID, placeID, rating = row[:3]
    sample.setdefault(userID, {})[placeID] = rating  # Possibly int(rating)?

或者,使用collections.defaultdict(dict)来避免需要setdefault(或涉及牺牲原子性的try / except KeyErrorif userID in sample:的替代方法setdefault代替不必要地创建空dict}:

import collections

sample = collections.defaultdict(dict)
for row in reader:
    userID, placeID, rating = row[:3]
    sample[userID][placeID] = rating

# Optional conversion back to plain dict
sample = dict(sample)

转换回普通dict可确保将来的查询不会自动生成密钥,正常情况下会引发KeyError,如果dict,它看起来像普通print 1}}它。

如果included_cols很重要(因为名称或列索引可能会发生变化),您可以使用operator.itemgetter来加速并简化一次提取所有需要的列:

from collections import defaultdict
from operator import itemgetter

included_cols = (0, 1, 2)
# If columns in data were actually:
# rating, foo, bar, userID, placeID
# we'd do this instead, itemgetter will handle all the rest:
# included_cols = (3, 4, 0)
get_cols = itemgetter(*included_cols)  # Create function to get needed indices at once

sample = defaultdict(dict)
# map(get_cols, ...) efficiently converts each row to a tuple of just 
# the three desired values as it goes, which also lets us unpack directly
# in the for loop, simplifying code even more by naming all variables directly
for userID, placeID, rating in map(get_cols, reader):
    sample[userID][placeID] = rating  # Possibly int(rating)?