Question

我有一个类似于

的ID列表

123ABC
AB_01_DC_01
RS_11_UV_5-43

我有一个100+ col / 5000 +行数组，并且只想保留ID与列表ID相匹配的列。

我想我需要：
1）将列表与列标题进行比较
2）获取列表中找到ID的列号 3）使用cut或awk仅保留那些列。

[解决]

我明白了！如果它可以帮助任何人，这就是我做的：

#loading column IDs in a list
header = fichier.readline().rstrip().split("\t")

indice = 0
listIndiceKeep = []
for colName in header:
    if colName in listNameKeep:
        #get position for each ID to be kept, +1 for cut command (sh col1 = python col0)
        listIndiceKeep.append(str(indice+1))
    indice += 1

txtListIndiceKeep = ",".join(listIndiceKeep)

os.system("cut -f"+txtListIndiceKeep+" "+tableFile+" > "+tableFileOut)

Answer 1

这不是一个完整的解决方案，因为MWE会有所帮助，但希望它可以帮到你。

第一步，如果您知道列ID在第一行，那么您可以使用以下内容提取它们并获取所需的列数：

head -1 your_file_with_columns | sed -e 's#\ #\'$'\n#g' | nl | grep -f your_file_with_patterns

\'$'\ n部分适用于OSX，但也适用于其他系统。接下来，您可以应用使用剪切的解决方案并应用选定的列号。如果你愿意，你可以使用eval做到这一点，或者也许有人可以建议一个更简洁的方式。

根据id列表保留列[Bash或Python]

1 个答案: