数据低于
data = [["'id'", "'state'", "'country'\n"],
['44', "'WD'", "'India'\n"],
['5', "'WD'", "'India'\n"],
['44', "'WD'", "'Japan'\n"],
['390', "'WD'", "'Japan'\n"],
['17', "'WD'", "'Japan'\n"],
['17', "'WD'", "'BEL'"]]
如何删除id中重复的元素。
此处44、17个ID正在重复
预期
[["'id'", "'state'", "'country'\n"]
['44', '1', "'WD'", "'India'\n"]
['5', "'WD'", "'India'\n"]
['390', "'WD'", "'Japan'\n"]
['17', "'WD'", "'Japan'\n"]]
伪代码
l = []
for i in range(len(a)):
print (a[i])
if i[0] == a[i][1]:
pass
else:
l.append(i)
答案 0 :(得分:4)
您可以在此处使用select (case when reward_name = 'CONTROL' then 'CONTROL' else 'OTHER' end),
count(*)
from t
group by (case when reward_name = 'CONTROL' then 'CONTROL' else 'OTHER' end);
:
dict
unique_data = {}
for sub_data in data:
sub_data_id = sub_data[0]
if sub_data_id not in unique_data:
unique_data[sub_data_id] = sub_data
的结构如下:
unique_data
要获取唯一商品,我们可以使用{
"'id'": ["'id'", "'state'", "'country'"],
'44': ['44', '1', "'WD'", "'India'"],
'5': ['5', "'WD'", "'India'"],
'390': ['390', "'WD'", "'Japan'"],
'17': ['17', "'WD'", "'Japan'"]
}
,它为我们提供:
list(unique_data.values())
答案 1 :(得分:1)
这也许是矫kill过正,但您可以使用itertools.groupby
来解决。通过键x[0]
进行分组,即列表中的第一个元素,然后从分组值中获取第一个值。
from itertools import groupby
data = [["'id'", "'state'", "'country'\n"], ['44', '1', "'WD'", "'India'\n"], ['5', "'WD'", "'India'\n"], ['44', "'WD'", "'Japan'\n"], ['390', "'WD'", "'Japan'\n"], ['17', "'WD'", "'Japan'\n"], ['17', "'WD'", "'BEL'"]]
key_function = lambda x : x[0]
data.sort(key=key_function)
result = [ list(values) [0] for _,values in groupby(data,key=key_function) ]
print(result)
输出
[["'id'", "'state'", "'country'\n"], ['17', "'WD'", "'Japan'\n"], ['390', "'WD'", "'Japan'\n"], ['44', '1', "'WD'", "'India'\n"], ['5', "'WD'", "'India'\n"]]
答案 2 :(得分:1)
data = [["'id'", "'state'", "'country'\n"], ['44', '1', "'WD'", "'India'\n"], ['5', "'WD'", "'India'\n"], ['44', "'WD'", "'Japan'\n"], ['390', "'WD'", "'Japan'\n"], ['17', "'WD'", "'Japan'\n"], ['17', "'WD'", "'BEL'"]]
ls={}
for each in data:
if not each[0] in ls:
ls[each[0]] = each[1:len(each)]
print(ls)
{"'id'": ["'state'", "'country'\n"],
'44': ['1', "'WD'", "'India'\n"],
'5': ["'WD'", "'India'\n"],
'390': ["'WD'", "'Japan'\n"],
'17': ["'WD'", "'Japan'\n"]}
ourlist = [[k]+v for k,v in ls.items()]
print(ourlist)
[["'id'", "'state'", "'country'\n"],
['44', '1', "'WD'", "'India'\n"],
['5', "'WD'", "'India'\n"],
['390', "'WD'", "'Japan'\n"],
['17', "'WD'", "'Japan'\n"]]
答案 3 :(得分:1)
使用这些类型的数据时,最好使用Pandas。您将获得灵活性和速度,避免不必要的循环。
data = [["'id'", "'state'", "'country'\n"],
['44', "'WD'", "'India'\n"],
['5', "'WD'", "'India'\n"],
['44', "'WD'", "'Japan'\n"],
['390', "'WD'", "'Japan'\n"],
['17', "'WD'", "'Japan'\n"],
['17', "'WD'", "'BEL'"]]
import pandas as pd
df = pd.DataFrame(data[1:],columns = data[0])
print(df.drop_duplicates(subset="'id'"))
输出:
'id' 'state' 'country'\n
0 44 'WD' 'India'\n
1 5 'WD' 'India'\n
3 390 'WD' 'Japan'\n
4 17 'WD' 'Japan'\n
答案 4 :(得分:0)
一个简单的解决方案是将这些值放入一个以id为键的字典中。然后,您可以简单地获取值。
示例:
expand.grid(names(dat), names(dat))
如果要保留订单,可以改用OrderedDict。