Question

我有许多文本文件列出（除其他外）一个人特征每个文件1个人。人员文件以制表符分隔的格式出现，例如person_1.txt可能是：

 xyz     tall     123
 abc     happy     123
 aby     slim     456
 zyg     intelligent     345
 mno     brown hair     678

person_2.txt可能是：

 xyz     average height     012
 abc     happy     123
 ccc     slightly overweight     123
 def     bubbly     234
 cde     brown hair     567

等，我有另一个名为features的.txt文件，它包含人员文件所具有的所有可能特征（大约600个）。它们按行列出如下：

 brown hair
 slightly overweight
 bubbly
 tall
 happy
 etc

我还有一个文件，其中包含每行所有“人员文件”1的名称，例如：

 person_1.txt
 person_2.txt
 person_3.txt
 etc

我正在寻找一个输出布尔数组的脚本，告诉我哪些人有哪些特征，逗号或制表符分隔。如下所示：

 characteristic     tall     happy     bubbly     slightly overweight   etc
 person_1     true     true     false     false
 person_2     false     true     true     true  
 etc

我对编程很陌生，从来没有任何正式的学费，所以任何指针都会很棒。非常感谢！

好的，所以我取得了一些进展：

 #make a list of characteristics
 f = open('geneList.txt')
 geneList = [line.strip() for line in open('geneList.txt')]
 geneList = sorted(geneList)
 f.close()

 i=0

 #add a tab space to make room for the names in subsequent lines
 with open('output.txt', 'a') as firstTab:
         firstTab.write('\t',)

 #add the first line to the matrix - a line with all genes tab delimited
 while i < len(geneList):
     with open('output.txt', 'a') as firstLine:
            firstLine.write(geneList[i] + '\t',)
            i=i+1

 with open('output.txt', 'a') as newLine:
         newLine.write('\n',)

 #make a list of all the people files
 p = open('people.txt')
 peopleList = [line.strip() for line in open('people.txt')]
 peopleList = sorted(peopleList)
 p.close()

将person文件中的基因与geneList进行比较并为geneList中的每个基因返回布尔值的方法（我正在努力获取文件列表以使每行单独读取，我尝试过：

 with (peopleList[0], 'r') as f:
      f.readline()

我想知道我的语法是否不正确，因为此语句会出错或者有更好的方法吗？我需要搜索geneList，如果基因/特征的人员列表中没有匹配的基因，则在另一个列表中添加“假”。如果它存在，则在列表中添加“True”，然后将下面的列表附加到一行。我会尝试类似的东西：

 genePresent.append( 'GeneA': True, 'GeneB':False etc)

将人们编写为第一个变量然后为每个基因编写布尔输出（尚未完成）的方法。我计划制作一个“真/假”字符串列表，可以在每个人之后和相应的特征/基因之后附加在同一行上

 n=0
 while n < len(peopleList):
     with open('output.txt', 'a') as outf:
         outf.write(peopleList[n] + '\t' + 'genePresent' + '\n')
         n=n+1

不幸的是，他们没有教授生物课程的任何编码，所以如果我问的是基本问题，我会道歉

Answer 1

我可以帮助你计划，但没有人会为你解决（我希望）：

您面临的问题需要使用data structures，file IO和string manipulation获得非常轻松的体验。

您应该使用这些方法（或类似方法），并相应地路由程序流程：

# you should probably store person records in a data structure
persons = []

def read_persons_config_file(filename):
    # this method will read the line separated person records from the persons file
    # it will use methods like [open][4] to access the file and [readlines][5] to use data
    # it should probably call read_person_file() on each record
    # this method can maintain a list of persons
    pass

def read_person_file(filename):
    # this method should read a single person record and store it in a [data structure][6]
    # of your choice. I'd recommend you go with [dictionaries][7], but there are many
    # alternatives
    # this method should either return a result to the caller (the data structure you chose)
    # or append the data structure to a list of data structures (persons) you've created
    pass

一旦有了这两种方法，您就可以继续使用您已经构建的数据。

在过滤\修改后列出它就像使用google

一样简单

尝试使用谷歌之前提出问题，让人们为您完成工作，您可以使用this链接。

使用python从许多文本文件构建一个布尔数组

1 个答案: