Question

我正在使用xlrd从excel表中抓取数据。我想要的数据有两列（带有“ID”和“位置”的列）。每列包含数千个条目，其中大多数是完全重复的。我只是想创建两个列表，其中包含两个excel列中的所有唯一条目。这是我的大多数代码，并显示了我打印其中一个列表时返回的内容的示例：

rawIDs = data.col_slice(colx=0,
                 start_rowx=0,
                 end_rowx=None) #getting all of column 1 in a list
IDs = []

for ID in rawIDs:
    if ID not in IDs:
        IDs.append(ID) #trying to create new list without duplicates, but it fails

rawlocations = data.col_slice(colx=1,
                     start_rowx=0,
                     end_rowx=None) #getting all of column 2 in a list

locations = []

for location in rawlocations:
    if location not in locations:
        locations.append(location) #same as before, also fails

print set(IDs) #even set() doesn't remove duplicates, it just prints "rawIDs"

无论我做什么，它总是打印原始列表，剩下所有重复项。

不言而喻，但我已经看过很多其他类似的stackoverflow帖子，他们的解决方案对我不起作用。

编辑：我对某个特定的错误。我意识到打印

print set(IDs)

实际上会返回

“set（[item，item，item ...]）”作为输出。所以它基本上将“set（）”放在“rawIDs”输出周围。尽管如此，这对我来说没有意义......

此处还有一个示例屏幕截图：

here is an example screenshot

Answer 1

解决方案：

似乎存储了元数据（可能是表中的坐标位置），因此即使文本可能相同，列表中的每个项目实际上都是不同的。

修改for循环以便他们添加项目的字符串，而不是项目本身，解决了我的问题并产生了没有重复项的新列表。

rawIDs = data.col_slice(colx=0,
                     start_rowx=5000,
                     end_rowx=5050)

IDs = []

for ID in rawIDs:
    if str(ID) not in IDs:
        IDs.append(str(ID))

rawlocations = data.col_slice(colx=1,
                     start_rowx=0,
                     end_rowx=None)

locations = []

for location in rawlocations:
    if str(location) not in locations:
        locations.append(str(location))

print IDs #it prints a list with no duplicates!

需要从列表中删除重复项。 Set（）函数不起作用。两者都不是for循环方法

1 个答案: