我正在定义一个函数,它将返回一个列表列表,其中元素零是2Darray,元素一是头信息,元素2是rowname。如何从
的文件中读取此内容文件如下所示:
基因S1 S2 S3 S4 S5
100 -0.243 -0.021 -0.205 -1.283 0.411
10000 -1.178 -0.79 0.063 -0.878 0.011
def input2DarrayData(fn):
# define twoDarray, headerLine and rowLabels
twoDarray = []
# open filehandle
fh = open(fileName)
# collect header information
# read in the rest of the data and organize it into a list of lists
for line in fh:
# split line into columns and append to array
arrayCols = line.strip().split('\t')
# collect rowname information
**what goes here?**
# convenient float conversion for each element in the list using the
# map function. note that this assumes each element is a number and can
# be cast as a float. see floatizeData(), which gives the explicit
# example of how the map function works conceptually.
twoDarray.append(map(float, arrayCols))
# return data
return twoDarray
我一直收到一个错误,说它无法将文件(基因)中的第一个单词转换为浮点数,因为它是一个字符串。所以我的问题是弄清楚如何阅读第一行
答案 0 :(得分:1)
def input2DarrayData(fn):
# define twoDarray, headerLine and rowLabels
twoDarray = []
headerLine = None
rowLabels = []
# open filehandle
fh = open(fn)
headerLine = fh.readline()
headerLine = headerLine.strip().split('\t')
for line in fh:
arrayCols = line.strip().split('\t')
rowLabels.append(arrayCols[0])
twoDarray.append(map(float, arrayCols[1:]))
# return data
return [twoDarray, headerLine, rowLabels]
如果这对您有用,请阅读PEP-8并重构变量和函数名称。另外别忘了关闭文件。最好使用with
为您关闭它:
def input2DarrayData(fn):
""
twoDarray = []
rowLabels = []
#
with open(fn) as fh:
headerLine = fh.readline()
headerLine = headerLine.strip().split('\t')
for line in fh:
arrayCols = line.strip().split('\t')
rowLabels.append(arrayCols[0])
twoDarray.append(map(float, arrayCols[1:]))
#
return [twoDarray, headerLine, rowLabels]
答案 1 :(得分:1)
要处理标题行(文件中的第一行),请在迭代剩余行之前使用.readline()
显式使用它:
fh = open(fileName)
headers = fh.readline().strip().split('\t')
for line in fh:
arrayCols = line.strip().split('\t')
## etc...
我不确定你想从文件中获取什么数据结构;您似乎暗示您希望每行包含标题的列表。复制这样的标题并没有多大意义。
假设一个包含标题行的相当简单的文件结构,以及每行固定数量的列,以下是一个生成每行使用标题作为键,列值作为值的字典的生成器:
def process_file(filepath):
## open the file
with open('my_file') as src:
## read the first line as headers
headers = src.readline().strip().split('\t')
for line in src:
## Split the line
line = line.strip().split('\t')
## Coerce each value to a float
line = [float(col) for col in line]
## Create a dictionary using headers and cols
line_dict = dict(zip(headers, line))
## Yield it
yield line_dict
>>> for row in process_file('path/to/myfile'):
... print row
>>>
>>> {'genes':100.00, 'S1':-0.243, 'S2':-0.021, 'S3':-0.205, 'S4': -1.283, 'S5': 0.411}
>>> {'genes':10000.00, 'S1':-1.178, 'S2':-0.79, 'S3':0.063, 'S4': -0.878, 'S5': 0.011}