我有一个数据集,列中有一个名称列表,另一列中有每个名称的响应。每个名称都列出两次,我想看看两个记录的回复之间是否达成一致。 即。
命名|回应1
命名|回应2
名称b |回应1
名称b |回应2
我创建了一个字典,其中键有两个值。字典创建名称作为键,每个响应作为值。我想创建一个列表来查看response1 = response2,或者response1!= response2。以下是我到目前为止的情况:
myDict = {}
if name not in myDict.keys():
myDict[name] = {'response1': answer}
else:
myDict[name]['reponse2'] = answer
match = True
for items in hospitalDict:
if hospitalDict[items] != hospitalDict[items]:
match = False
print match
我被困在这一部分...关于如何构建这个的任何建议?我还想最终将这些数据输出到csv。
答案 0 :(得分:0)
我假设您想要实现的是具有匹配答案的名称列表。
鉴于您已创建字典:
myDict = {
'name 1' : {'response1':'A', 'response2':'B'},
'name 2' : {'response1':'C', 'response2':'C'},
... ,
'name N' : {'response1':'Z', 'response2':'Z'},
}
您可以执行以下操作:
myList = []
for name, resp in myDict:
if resp['response1'] == resp['response2']:
myList.append(name)
print myList
结果应该是这样的:
['name 2', ..., 'name N']
该列表仅包含具有匹配答案的名称
答案 1 :(得分:0)
我实现了NamesDict类,它可以为每个键(名称)提供2个或更多响应,比较它和导出。要在csv中导出,您可以轻松地将csv库用于python:https://docs.python.org/2/library/csv.html
我把比较和csv导出函数作为一个例子。
import csv
class NamesDict(dict):
def __init__(self, *args, **kwargs):
super(NamesDict, self).__init__(*args, **kwargs)
def __setitem__(self, key, item):
if isinstance(item, dict):
self.__dict__[key] = item
else:
raise Exception('item has to be a dict')
def __getitem__(self, key):
return self.__dict__[key]
def responses_match(self, key):
# Here you can implement your own comparison method
match = False
for key_one, value_one in self.__dict__[key].items():
for key_two, value_two in self.__dict__[key].items():
if key_one != key_two and value_one == value_two:
match = True
return match
def export_csv(self, path):
# Here you can change csv export
with open(path, 'wb') as csv_file:
fieldnames = ['name', 'responses']
writer = csv.DictWriter(csv_file, fieldnames=fieldnames)
for key, value in self.__dict__.items():
responses_string = ''
for resp_key, resp in value.items():
responses_string += '%s=%s\n' % (str(resp_key), str(resp_key))
writer.writerow({
'name': key,
'responses': responses_string
})
if __name__ == '__main__':
namesDict = NamesDict()
namesDict['test'] = {
'response1': 1,
'response2': 2
}
namesDict['test2'] = {
'response1': 2,
'response2': 2
}
print(namesDict.responses_match('test')) # False
print(namesDict.responses_match('test2')) # True
namesDict.export_csv('test.csv')
答案 2 :(得分:0)
您可以使用groupby和customer key功能分隔两组
from itertools import groupby
myDict = {
'name 1' : {'response1':'A', 'response2':'B'},
'name 2' : {'response1':'C', 'response2':'C'},
'name N' : {'response1':'Z', 'response2':'Z'},
}
for i in groupby(myDict.items(),key = lambda x: x[1]['response1'] == x[1]['response2']):
print i[0],list(i[1])
True [('name N', {'response2': 'Z', 'response1': 'Z'}), ('name 2', {'response2': 'C', 'response1': 'C'})]
False [('name 1', {'response2': 'B', 'response1': 'A'})]
答案 3 :(得分:0)
假设myDict和hospitalDict是相同的字典,您只需要一个简单的更改。改变这一行:
if hospitalDict[items] != hospitalDict[items]:
到线:
if hospitalDict[items]["response1"] != hospitalDict[items]["response2"]:
如果所有答案都匹配,匹配将为True,如果有一个或多个不同的答案,则为
答案 4 :(得分:0)
假设输入是以下形式的列表:
input_data = [
[ "bob", "okay" ],
[ "bob", "okay" ],
[ "tom", "yes" ],
[ "tom", "no" ],
[ "kim", "red" ],
[ "kim", "blue" ],
[ "kim", "red" ],
[ "kim", "green" ],
[ "kim", "blue" ],
]
您可以按如下方式处理:
myDict = {}
for name, answer in input_data:
myDict.setdefault(name, set()).add(answer)
print "The following names had differing responses"
for name, answers in myDict.items():
if len(answers) > 1:
print name, list(answers)
此代码段产生输出:
The following names had differing responses
kim ['blue', 'green', 'red']
tom ['yes', 'no']