根据重复的第一个元素从列表列表中移除重复项

时间:2020-08-08 17:47:39

标签: python list

数据低于

data = [["'id'", "'state'", "'country'\n"],
        ['44', "'WD'", "'India'\n"],
        ['5', "'WD'", "'India'\n"],
        ['44', "'WD'", "'Japan'\n"],
        ['390', "'WD'", "'Japan'\n"],
        ['17', "'WD'", "'Japan'\n"],
        ['17', "'WD'", "'BEL'"]]

如何删除id中重复的元素。

此处44、17个ID正在重复

预期

[["'id'", "'state'", "'country'\n"]
['44', '1', "'WD'", "'India'\n"]
['5', "'WD'", "'India'\n"]
['390', "'WD'", "'Japan'\n"]
['17', "'WD'", "'Japan'\n"]]

伪代码

l = []

for i in range(len(a)):
    print (a[i])
    if i[0] == a[i][1]:
        pass
    else:
        l.append(i)

5 个答案:

答案 0 :(得分:4)

您可以在此处使用select (case when reward_name = 'CONTROL' then 'CONTROL' else 'OTHER' end), count(*) from t group by (case when reward_name = 'CONTROL' then 'CONTROL' else 'OTHER' end);

dict

unique_data = {} for sub_data in data: sub_data_id = sub_data[0] if sub_data_id not in unique_data: unique_data[sub_data_id] = sub_data 的结构如下:

unique_data

要获取唯一商品,我们可以使用{ "'id'": ["'id'", "'state'", "'country'"], '44': ['44', '1', "'WD'", "'India'"], '5': ['5', "'WD'", "'India'"], '390': ['390', "'WD'", "'Japan'"], '17': ['17', "'WD'", "'Japan'"] } ,它为我们提供:

list(unique_data.values())

答案 1 :(得分:1)

这也许是矫kill过正,但您可以使用itertools.groupby来解决。通过键x[0]进行分组,即列表中的第一个元素,然后从分组值中获取第一个值。

from itertools import groupby
data = [["'id'", "'state'", "'country'\n"], ['44', '1', "'WD'", "'India'\n"], ['5', "'WD'", "'India'\n"], ['44', "'WD'", "'Japan'\n"], ['390', "'WD'", "'Japan'\n"], ['17', "'WD'", "'Japan'\n"], ['17', "'WD'", "'BEL'"]]
key_function = lambda x : x[0]
data.sort(key=key_function)
result = [ list(values) [0] for _,values in groupby(data,key=key_function) ]
print(result)

输出

[["'id'", "'state'", "'country'\n"], ['17', "'WD'", "'Japan'\n"], ['390', "'WD'", "'Japan'\n"], ['44', '1', "'WD'", "'India'\n"], ['5', "'WD'", "'India'\n"]]

答案 2 :(得分:1)


data = [["'id'", "'state'", "'country'\n"], ['44', '1', "'WD'", "'India'\n"], ['5', "'WD'", "'India'\n"], ['44', "'WD'", "'Japan'\n"], ['390', "'WD'", "'Japan'\n"], ['17', "'WD'", "'Japan'\n"], ['17', "'WD'", "'BEL'"]]

ls={}
for each  in data:
    if not each[0] in ls:
        ls[each[0]] = each[1:len(each)]
print(ls)
{"'id'": ["'state'", "'country'\n"],
   '44': ['1', "'WD'", "'India'\n"],
   '5': ["'WD'", "'India'\n"], 
   '390': ["'WD'", "'Japan'\n"], 
   '17': ["'WD'", "'Japan'\n"]}
    
ourlist = [[k]+v for k,v in ls.items()]

print(ourlist)

[["'id'", "'state'", "'country'\n"], 
['44', '1', "'WD'", "'India'\n"],
['5', "'WD'", "'India'\n"],
['390', "'WD'", "'Japan'\n"],
['17', "'WD'", "'Japan'\n"]]

答案 3 :(得分:1)

使用这些类型的数据时,最好使用Pandas。您将获得灵活性和速度,避免不必要的循环。

data = [["'id'", "'state'", "'country'\n"],
        ['44', "'WD'", "'India'\n"],
        ['5', "'WD'", "'India'\n"],
        ['44', "'WD'", "'Japan'\n"],
        ['390', "'WD'", "'Japan'\n"],
        ['17', "'WD'", "'Japan'\n"],
        ['17', "'WD'", "'BEL'"]]
           
import pandas as pd
df = pd.DataFrame(data[1:],columns = data[0])
print(df.drop_duplicates(subset="'id'"))

输出:

    'id' 'state' 'country'\n
0   44    'WD'   'India'\n
1    5    'WD'   'India'\n
3  390    'WD'   'Japan'\n
4   17    'WD'   'Japan'\n

答案 4 :(得分:0)

一个简单的解决方案是将这些值放入一个以id为键的字典中。然后,您可以简单地获取值。

示例:

expand.grid(names(dat), names(dat))

如果要保留订单,可以改用OrderedDict

相关问题