Python:将列表设为唯一并替换重复项

时间:2018-10-29 22:44:47

标签: python arrays numpy scipy

我想找到整数列表的保留顺序的唯一列表,其中重复项用零替换。在上下文中查找从一组多维点到另一个多维点的最近点。

示例

import scipy
import numpy as np
a = np.random.rand(100,4)
b = np.random.rand(200,4)
tree=scipy.spatial.cKDTree(a)
indexesOf_neighbors= tree.query(b, 1)[1]
_, idx = np.unique(indexesOf_neighbors, return_index=True)
print(indexesOf_neighbors)
print(indexesOf_neighbors[np.sort(idx)])

因此,应保留数字的首次出现。所有以下重复项均应替换为np.infs,例如:

[38 66 79 10 35 83 99 89 68 65 20 np.inf 46 np.inf 24 51 13 0 17 87 90 54 45 63  69 56 np.inf 32 62 49 99 67 82 np.inf 64 np.inf np.inf np.inf ...]

另一种选择是查找重复项的所有索引(而不是它们的首次出现)

2 个答案:

答案 0 :(得分:1)

如何使用np.inf制作数组,然后替换唯一位置?

from scipy import spatial
import numpy as np

a = np.random.rand(100,4)
b = np.random.rand(200,4)
tree=spatial.cKDTree(a)
indexesOf_neighbors= tree.query(b, 1)[1]
u, idx = np.unique(indexesOf_neighbors, return_index=True)
print(indexesOf_neighbors)

u_indexesOf_neighbors = np.empty(indexesOf_neighbors.shape, dtype=np.float64)
u_indexesOf_neighbors.fill(np.inf)
u_indexesOf_neighbors[idx] = u
print(u_indexesOf_neighbors)

结果

[82 61  4  5 32 48 62 80 50 96 84 49 37 58 17 80 52  1 33 76 50 24 22 31
  3 77 71  3 30 43 89 67 74 18 39 72 96 16 29 29 11 59 83 12 55  3 34 87
 74 93 21 96 83 89 21 61  3 81 39 93  8 80 64 47 83 27 46 34 72 64 34 42
 72 82 74 70  0 23 56 14 69 88  2 87 26 56 89 53  3 33 94 43 43  8 86  2
 76 10 95 71 99 76 82 87 92 97 92 25 61 48 94 15 55 86 35 87 83 66 39 79
 77 57 62  1 43 74 27 34 16 83 29 34 31  2 90 51  1  2 33 17 30 96  2 82
 22 44  0 88  7 33 36 55 95 94 64 54 86 36 34 24 48  1  7 68 77 30 70 24
 28 73 43 16 20 56 55 94 63 71  5 38 86 46 23 66 48  1 72  7  8 88 56  1
 80 85 84  7 97  2 55 35]
[82. 61.  4.  5. 32. 48. 62. 80. 50. 96. 84. 49. 37. 58. 17. inf 52.  1.
 33. 76. inf 24. 22. 31.  3. 77. 71. inf 30. 43. 89. 67. 74. 18. 39. 72.
 inf 16. 29. inf 11. 59. 83. 12. 55. inf 34. 87. inf 93. 21. inf inf inf
 inf inf inf 81. inf inf  8. inf 64. 47. inf 27. 46. inf inf inf inf 42.
 inf inf inf 70.  0. 23. 56. 14. 69. 88.  2. inf 26. inf inf 53. inf inf
 94. inf inf inf 86. inf inf 10. 95. inf 99. inf inf inf 92. 97. inf 25.
 inf inf inf 15. inf inf 35. inf inf 66. inf 79. inf 57. inf inf inf inf
 inf inf inf inf inf inf inf inf 90. 51. inf inf inf inf inf inf inf inf
 inf 44. inf inf  7. inf 36. inf inf inf inf 54. inf inf inf inf inf inf
 inf 68. inf inf inf inf 28. 73. inf inf 20. inf inf inf 63. inf inf 38.
 inf inf inf inf inf inf inf inf inf inf inf inf inf 85. inf inf inf inf
 inf inf]

我选择了float64 dtype。但是您可以选择任何一个。

答案 1 :(得分:0)

制作列表的独立副本,以备日后需要原始文档时使用:

unique_list = indexesOf_neighbors.copy()
####keep track of unique items
unique_set = []

for idx, i in enumerate(unique_list):
    if i not in unique_set:
        unique_set.append(i) #add first non-repeating element to set
    else:
        unique_list[idx] = 0 #replace duplicates with zero