注意:我正在使用Python来读取此文件。
我目前有一个数据文件安排如下:
1 0.1803 233.650000 101.52010 37.95730 96.41869
0.462300 1.425000e+12 1.811000e+12 1.710841e+10
0.456300 1.811000e+12 1.811000e+12 1.711282e+10
0.450300 9.443000e+11 9.443000e+11 9.842220e+09
0.444300 7.089000e+11 7.089000e+11 6.764462e+09
0 0.2523 462.060000 96.47176 48.58004 84.13097
0.456300 1.325000e+13 1.325000e+13 7.735244e+10
0.450300 1.283000e+13 1.283000e+13 7.684167e+10
0.444300 1.182000e+13 1.182000e+13 7.571757e+10
0.438300 1.002000e+13 1.002000e+13 7.352358e+10
0.432300 8.971000e+12 8.971000e+12 7.196254e+10
1 0.0000 74.230000 81.10059 46.28531 95.17891
0.342300 2.862000e+10 3.803000e+10 9.795136e+06
0 0.9493 776.060000 98.65339 41.54604 94.64194
1.000300 1.467000e+14 1.674000e+14 1.279873e+11
0.997300 1.467000e+14 1.674000e+14 1.280501e+11
0.994300 1.476000e+14 1.674000e+14 1.281122e+11
本质上,数据是一个列表的大列表,其中每个列表由空格分隔。每个列表的第一行有6列,后续行都有4列。每个列表的长度各不相同。我希望能够只选择符合某些标准的列表。例如,我只为每个列表的第一行的第一个元素选择值为0的列表,因此它只会选择上面给出的示例数据中的第2和第4个列表。
我对解决方案的想法:我只选择每个列表的第一行,并创建这些值的单独数组。然后我可以使用where()函数找到第一个元素为0的索引。然后我会选择与这些索引相对应的列表。
问题是我不知道如何处理我的数据中的空白行。我不知道如何索引由空行分隔的列表,也不知道如何只选择空行后出现的那些数据行。任何人对如何实施我的解决方案有任何想法,或者有没有人有任何其他解决方案?提前致谢。
答案 0 :(得分:3)
假设您想获得列表清单:
>>> import csv
>>> from itertools import groupby
>>> grouper = lambda rec: len(rec) > 0
>>> with open("data.txt") as f:
... reader = csv.reader(f, delimiter=" ")
... res = [list(items) for group, items in groupby(reader, key=grouper) if group]
...
>>> res
[[['1', '0.1803', '233.650000', '101.52010', '37.95730', '96.41869'],
['0.462300', '1.425000e+12', '1.811000e+12', '1.710841e+10'],
['0.456300', '1.811000e+12', '1.811000e+12', '1.711282e+10'],
['0.450300', '9.443000e+11', '9.443000e+11', '9.842220e+09'],
['0.444300', '7.089000e+11', '7.089000e+11', '6.764462e+09']],
[['0', '0.2523', '462.060000', '96.47176', '48.58004', '84.13097'],
['0.456300', '1.325000e+13', '1.325000e+13', '7.735244e+10'],
['0.450300', '1.283000e+13', '1.283000e+13', '7.684167e+10'],
['0.444300', '1.182000e+13', '1.182000e+13', '7.571757e+10'],
['0.438300', '1.002000e+13', '1.002000e+13', '7.352358e+10'],
['0.432300', '8.971000e+12', '8.971000e+12', '7.196254e+10']],
[['1', '0.0000', '74.230000', '81.10059', '46.28531', '95.17891'],
['0.342300', '2.862000e+10', '3.803000e+10', '9.795136e+06']],
[['0', '0.9493', '776.060000', '98.65339', '41.54604', '94.64194'],
['1.000300', '1.467000e+14', '1.674000e+14', '1.279873e+11'],
['0.997300', '1.467000e+14', '1.674000e+14', '1.280501e+11'],
['0.994300', '1.476000e+14', '1.674000e+14', '1.281122e+11']]]
函数grouper
作为参数记录(csv.reader提供数字列表),如果列表不为空,则返回True
,如果没有项,则返回False。 / p>
如果按此值分组,则会得到以空行分隔的组。
唯一剩下的步骤是摆脱那些由空行引起的小群体。
列表理解允许通过最终if <condition>
语句进行过滤。在这里我们可以重用
由groupby提供的True
或False
。
groupby
将第一个参数作为可迭代的,key
参数定义了
可调用,从特定项目分组值计算。一旦分组值发生变化,
新集团屈服了。
groupby
产生元组,第一项是塑造组的值(True
或False
),第二项
可以与该组中的所有项目进行迭代。
如果您希望将数字读作浮点数,我们可以定义一个函数floater
接受来自res
的项目作为参数,并在所有子列表上应用float
:
def floater(lstlst):
return [map(float, items) for items in lstlst]
然后解决方案看起来像:
>>> import csv
>>> from itertools import groupby
>>> grouper = lambda rec: len(rec) > 0
>>> with open("data.txt") as f:
... reader = csv.reader(f, delimiter=" ")
... res = [floater(items) for group, items in groupby(reader, key=grouper) if group]
>>> res
[[[1.0, 0.1803, 233.65, 101.5201, 37.9573, 96.41869],
[0.4623, 1425000000000.0, 1811000000000.0, 17108410000.0],
[0.4563, 1811000000000.0, 1811000000000.0, 17112820000.0],
[0.4503, 944300000000.0, 944300000000.0, 9842220000.0],
[0.4443, 708900000000.0, 708900000000.0, 6764462000.0]],
[[0.0, 0.2523, 462.06, 96.47176, 48.58004, 84.13097],
[0.4563, 13250000000000.0, 13250000000000.0, 77352440000.0],
[0.4503, 12830000000000.0, 12830000000000.0, 76841670000.0],
[0.4443, 11820000000000.0, 11820000000000.0, 75717570000.0],
[0.4383, 10020000000000.0, 10020000000000.0, 73523580000.0],
[0.4323, 8971000000000.0, 8971000000000.0, 71962540000.0]],
[[1.0, 0.0, 74.23, 81.10059, 46.28531, 95.17891],
[0.3423, 28620000000.0, 38030000000.0, 9795136.0]],
[[0.0, 0.9493, 776.06, 98.65339, 41.54604, 94.64194],
[1.0003, 146700000000000.0, 167400000000000.0, 127987300000.0],
[0.9973, 146700000000000.0, 167400000000000.0, 128050100000.0],
[0.9943, 147600000000000.0, 167400000000000.0, 128112200000.0]]]
答案 1 :(得分:1)
列表理解使这很容易:
>>> s = open(yourfile).read()
>>> data = [[map(float, row) for row in map(str.split, sublist)] for sublist in (group.split('\n') for group in s.split('\n\n'))]
>>> result = [group for group in data if group[0][0] == 0]
首先,让我们将其解析为可以通过编程方式轻松访问的内容。
列表列表对我来说似乎是合理的,类似下面的内容是理想的:
[[[1.0, 0.1803, 233.65, 101.5201, 37.9573, 96.41869],
[0.4623, 1425000000000.0, 1811000000000.0, 17108410000.0],
[0.4563, 1811000000000.0, 1811000000000.0, 17112820000.0],
[0.4503, 944300000000.0, 944300000000.0, 9842220000.0],
[0.4443, 708900000000.0, 708900000000.0, 6764462000.0]],
[[0.0, 0.2523, 462.06, 96.47176, 48.58004, 84.13097],
[0.4563, 13250000000000.0, 13250000000000.0, 77352440000.0],
[0.4503, 12830000000000.0, 12830000000000.0, 76841670000.0],
[0.4443, 11820000000000.0, 11820000000000.0, 75717570000.0],
[0.4383, 10020000000000.0, 10020000000000.0, 73523580000.0],
[0.4323, 8971000000000.0, 8971000000000.0, 71962540000.0]],
[[1.0, 0.0, 74.23, 81.10059, 46.28531, 95.17891],
[0.3423, 28620000000.0, 38030000000.0, 9795136.0]],
[[0.0, 0.9493, 776.06, 98.65339, 41.54604, 94.64194],
[1.0003, 146700000000000.0, 167400000000000.0, 127987300000.0],
[0.9973, 146700000000000.0, 167400000000000.0, 128050100000.0],
[0.9943, 147600000000000.0, 167400000000000.0, 128112200000.0]]]
为此,我们可以使用列表推导:
>>> s = open(yourfile).read()
>>> data = [[map(float, row) for row in map(str.split, sublist)] for sublist in (group.split('\n') for group in s.split('\n\n'))]
这个列表理解最容易从右到左阅读:
split('\n\n')
在连续的新行上分割输入,以便为我们提供group
的列表。这需要&#34;空行&#34;你提到的问题。group
,我们按'\n'
分割,以便为我们提供sublist
的列表。row
中的每个sublist
,我们都会:
map(str.stplit, sublist)
按空格拆分,为我们提供str
float
map(float, row)
的列表
现在,根据某些条件选择数据......
同样,我们可以使用列表推导。要仅选择将0
作为第一行的第一个元素的组:
>>> result = [group for group in data if group[0][0] == 0]
这将导致:
[[[0.0, 0.2523, 462.06, 96.47176, 48.58004, 84.13097],
[0.4563, 13250000000000.0, 13250000000000.0, 77352440000.0],
[0.4503, 12830000000000.0, 12830000000000.0, 76841670000.0],
[0.4443, 11820000000000.0, 11820000000000.0, 75717570000.0],
[0.4383, 10020000000000.0, 10020000000000.0, 73523580000.0],
[0.4323, 8971000000000.0, 8971000000000.0, 71962540000.0]],
[[0.0, 0.9493, 776.06, 98.65339, 41.54604, 94.64194],
[1.0003, 146700000000000.0, 167400000000000.0, 127987300000.0],
[0.9973, 146700000000000.0, 167400000000000.0, 128050100000.0],
[0.9943, 147600000000000.0, 167400000000000.0, 128112200000.0]]]
所有这些都是通过一些非常强大的Python内置函数完成的,无需导入任何模块!
答案 2 :(得分:0)
您需要阅读文件,并根据您所处的位置处理不同的案例。
这里有一些注释代码供您使用:
function read_data(f):
first, rest = None, [] # Reset data
for line in f: # Run over lines in the file
if not line.strip(): # In case of empty line (or only whitespace)
yield first, rest # Yield the currently held values
first, rest = None, [] # Reset data
continue # Skip this line
if first is None: # If we're at the beginning of a new set
first = [float(x) for x in line.split()] # Read it into "first"
continue # And go on
# Otherwise, we're inside a list, so read that into rest
rest.append([float(x) for x in line.split()])
# The file is done, but since there was no empty line,
# we didn't yield the last entry, so we yield it now
yield first, rest
答案 3 :(得分:0)
我会尝试将每个列表列表转换为python中的实际列表列表。这将使他们更容易处理,然后你可以通过迭代列表而不是文件来处理任何情况。
lists=[] #this would be your lists of lists of lists (redundant enough for you?)
f=open("whateverfilename.dat")
j=[]
for line in f:
if line=="\n": #if the line is blank
lists.append(j) #add the list of lists to your list of lists of lists
j=[] #clear j for next batch of data
else:
a=line.split() #split each piece of data into a list
j.append(a) #add it to the list of lists you are currently on
这将允许您将数据作为常规列表进行迭代,这在我看来比迭代文件要容易得多。