手头的问题:
我有以下列表的元组(ID,国家),我最终将存储在MySQL表中。
mylist = [(10, 'Other'), (10, 'India'), (10, 'Unknown'), (11, 'Other'), (11, 'Unknown'), (12, 'USA'), (12, 'UK'), (12, 'Other')]
我想对待其他'和'未知'使用以下条件:
Value Replaced by => This value
----------------------------------------
Other & Unknown => Other
A country & Other => Country
A country & Unknown => Country
Python:
def refinelist(mylist):
'''Updating the list to remove unwanted values'''
'''
Other & Unknown => Other
A country & Other => Country
A country & Unknown => Country
'''
if 'Other' in mylist and 'Unknown' in mylist:
print 'remove unknown'
mylist.remove('Unknown')
if 'Other' in mylist and len(mylist) >= 2:
print 'remove other'
mylist.remove('Other')
if 'Unknown' in mylist and len(mylist) >= 2:
print 'remove unknown'
mylist.remove('Unknown')
return mylist
def main():
mylist = [(10, 'Other'), (10, 'India'), (10, 'Unknown'), (11, 'Other'), (11, 'Unknown'), (12, 'USA'), (12, 'UK'), (12, 'Other')]
d = {}
for x,y in mylist:
d.setdefault(x, []).append(y)
# Clean the list values
for each in d:
d[each] = refinelist(d[each])
## Convert dict to list of tuples for database entry
outlist = []
#result = [(key, value) for key,value in d.keys(), value in d.values()] ## Couldn't get this to work. Can the below loop be written as list comprehension with minimal footprint?
for key, value in d.items():
if len(value) == 1:
print key, value[0]
outlist.append((key, value[0]))
elif len(value) > 1:
for eachval in value:
print key, eachval
outlist.append((key, eachval))
print outlist
if __name__ == "__main__":
main()
输出
remove unknown
remove other
remove unknown
remove other
10 India
11 Other
12 USA
12 UK
[(10, 'India'), (11, 'Other'), (12, 'USA'), (12, 'UK')]
问题:
我觉得这可以更有效地完成。使用dict overkill?
我从一个元组(luples)列表开始,将它转换为dict,执行一个干净的操作,然后将其转换回luples?
我可以在MySQL表格中插入原始的元组,然后处理“未知”'和'其他'几乎没有查询,但我更喜欢Python的任务。
非常感谢pythonic解决方案或代码的一些评论家。
答案 0 :(得分:6)
广泛使用生成器和列表理解,你可以这样写:
other = ['Other', 'Unknown'] # Strings denoting non-contries
ids = set(i for i,j in mylist) # All ids in the list
known = set(i for i,j in mylist if j not in other) # Ids of real countries
outlist = [k for k in mylist if k[1] not in other] # Keep all real countries
outlist.extend((i, other[0]) for i in ids - known) # Append "Other" for all IDs with no real country
结果将是
[(10, 'India'), (12, 'USA'), (12, 'UK'), (11, 'Other')]
如果订单很重要,这将意味着更多的工作。
答案 1 :(得分:2)
首先,您的代码会在每次删除调用时产生大量昂贵的列表操作。如果订单很重要,您可以执行以下操作,只需先排序,然后再次浏览列表。 (我把它写成一个生成器,这样你(1)不需要创建一个列表(如果你要将这个权利添加到数据库中)和(2)以便你避免所有附加操作。
def filter_list(lst):
lst = sorted(lst)
curr_id = lst[0][0]
found_country = False
for id, elem in lst:
if id != curr_id:
if not found_country:
yield (curr_id, "Other")
curr_id = id
found_country=False
if elem not in ("Other", "Unknown"):
yield (curr_id, elem)
found_country = True
如果您只想获取列表,请使用list(filter_list(input_list))。 (自由地承认它不是最优雅的)
答案 2 :(得分:0)
更短但可能更慢的解决方案:
na_list = ['Other', 'Unknown']
data = dict()
result = list()
for i in mylist:
k = str(i[0])
data.setdefault(k, [])
data[k].append(i[1])
for k,v in data.iteritems():
if not len(set(v) - set(na_list)):
result.append((int(k), na_list[0]))
else:
for c in set(v) - set(na_list):
result.append((int(k), c))