Question

我对Python不太熟悉，但我需要做些什么。我有几个列的ASCII文件（空格分隔）。在第一列中，某些值是重复的。从这些重复值中，我需要选择第3列中具有较大值的行，然后返回一个数组。我想要这样的事情：

#col1    col2    col3    col4    col5
1         1       2       3       4
1         2       1       5       3
2         2       5       2       1

将返回第1行和第3行。这是我到目前为止：我定义了一个辅助函数来检测重复的索引（所有第二个条目）

def list_duplicates(seq):
    seen = set()
    seen_add = seen.add
    return [idx for idx,item in enumerate(seq) if item in seen or seen_add(item)]

然后尝试使用它来读取列表（我从np.genfromtxt命名每个列的文件加载）

def select_high(ndarray, dup_col, sel_col): #dup_col is the column where the duplicates are, sel_col is the column where we select the larger value
    result = []
    dup = list_duplicates(ndarray[dup_col])
    dupdup = [x-1 for x in dup]
    for i in range(len(ndarray[sel_col])):        
        if i in dup:
            mid = []
            maxi = max(ndarray[sel_col][i], ndarray[sel_col][i-1])
            maxi_index = np.where(ndarray[sel_col] == maxi)[0][0]
            for name in ndarray.dtype.names:
                mid.append(ndarray[name][maxi_index])
            result.append(mid)
        else:
            mid = []
            if i not in dupdup:
                for name in ndarray.dtype.names:
                    mid.append(ndarray[name][i])
            result.append(mid)

    return np.asarray(result)

但正在发生的事情是，每当有重复项时，我必须删除else部分或者它会给我一个错误，并且每当没有重复项时，我必须将其重新放回。感谢任何帮助，对于长篇文章感到抱歉，我希望自己能够做好准备

Answer 1

我认为你迷失在细节中（我也是）。这是一个做你想要的版本，但更简单：

m = [[1, 2, 1, 5, 3], [1, 1, 2, 3, 4], [2, 2, 5, 2, 1]]
s = sorted(m,  key=lambda r:(r[0], -r[2]))
print(s) 
seen = set()
print( [r for r in s if r[0] not in seen and not seen.add(r[0])])

第一行将m定义为从文件中获取的行列表。

第二行对第一列（r[0]）中的值进行排序，然后对第三列中的值进行排序，但是从较大值到较小值（-r[2]）：< / p>

s=[[1, 1, 2, 3, 4], [1, 2, 1, 5, 3], [2, 2, 5, 2, 1]]

现在，您需要在第一列中看到至少一次的值时跳过行。我们使用集合seen来存储我们已经看到的r[0]值。如果r[0]不在seen，我们应该保留该行并将其放入seen，以便我们在下次看到r[0]时放弃该行。这有点棘手：

if r[0] not in seen and not seen.add(r[0])

请注意not seen.add(r[0])始终为真，因为seen.add会返回None。因此：

如果r[0]不在seen，我们会将r[0]放入seen并保留行
如果r[0]位于seen，我们会返回false并弃置该行。

您也可以这样表达：

if not (r[0] in seen or seen.add(r[0]))

python根据列的最大值选择行

1 个答案: