在文本块

时间:2015-04-30 18:30:03

标签: python csv

我是python的新手,需要一些指导。

我有一个Text文件,其中包含多个模拟的输出结果,我需要在每个块之间提取特定值。见下面的样本:

**********************************************
SIMULATION NUMBER =     1  SEED NUMBER:      1430403561
INTERVAL    1, NUMBER OF STORMS    0
INTERVAL    2, NUMBER OF STORMS    1
STORM RESPONSES
  1  544.95
INTERVAL    3, NUMBER OF STORMS    0
INTERVAL    4, NUMBER OF STORMS    0
INTERVAL    5, NUMBER OF STORMS    0
INTERVAL    6, NUMBER OF STORMS    1
STORM RESPONSES
  1  526.68
INTERVAL    7, NUMBER OF STORMS    0
INTERVAL    8, NUMBER OF STORMS    0
INTERVAL    9, NUMBER OF STORMS    0
INTERVAL   10, NUMBER OF STORMS    0
INTERVAL   11, NUMBER OF STORMS    0
INTERVAL   12, NUMBER OF STORMS    1
STORM RESPONSES
  1  518.77
INTERVAL   13, NUMBER OF STORMS    0
INTERVAL   14, NUMBER OF STORMS    0
INTERVAL   15, NUMBER OF STORMS    0
INTERVAL   16, NUMBER OF STORMS    0
INTERVAL   17, NUMBER OF STORMS    0
INTERVAL   18, NUMBER OF STORMS    0
INTERVAL   19, NUMBER OF STORMS    1
STORM RESPONSES
  1  614.23
**********************************************

所需信息介于每个“************************************ ******“ - 这些之间的信息表示需要搜索的单个”块“或模拟运行。

基本上我需要的是搜索“INTERVAL”值小于或等于30的块,“NUMBER OF STORMS”大于0,并且它们之上的“INTERVAL”的相关“STORM RESPONSES”更大比648.

我需要一个汇总输出表,其中的行说明每个模拟块的查询是TRUE还是FALSE(此特定文件有1000个模拟)。

非常感谢任何帮助。我确信我可以在Excel中解决这个问题,但我觉得我可以用Python来解决这个问题(并且可以更精简)。

这是我到目前为止所做的:

import os
import sys

f = open('D:\log.txt')

chunks = []  #each chunk is a section of text that is what is between *** lines

tmp_text = ''
for line in f:
  if line.strip() == '******...***':
    if tmp_text != '': #I don't know if file starts with *** or not
      chunks.append(tmp_text)
      tmp_text = ''
  else:
    tmp_text += line
if tmp_text != '':
  chunks.append(tmp_text) #in case the file does not end in ****
f.close()

#chunks will be in the order that you expect them.
for chunk in chunks:
          for line in chunk :
            if "INTERVAL    " + x<=30 + ", NUMBER OF STORMS    " + x<=3 or "INTERVAL   " + x<=30 + ", NUMBER OF STORMS    " + x<=3

我对如何在低于648的“STORM RESPONSES”之下提取valueS感到困惑。另外,我在“for line in chunk”之后添加的“if”语句是否会起作用?

import os
import sys

f = open('D:\LBI_Easement_Issues\log.txt')

chunks = []  #each chunk is a section of text that is what is between *** lines
interval = 1
numberstorms = 1
tmp_text = ''
for line in f:
  if line.strip() == '******...***':
    if tmp_text != '': #I don't know if file starts with *** or not
      chunks.append(tmp_text)
      tmp_text = ''
  else:
    tmp_text += line
if tmp_text != '':
  chunks.append(tmp_text) #in case the file does not end in ****
f.close()

#chunks will be in the order that you expect them.
for chunk in chunks:
          for line in chunk :
            print line
            query = "True" if  "INTERVAL    " + str(interval) + ", NUMBER OF STORMS    " + str(numberstorms) or "INTERVAL   " + str(interval) + ", NUMBER OF STORMS    " + srt(numberstorms) else "False"
            print query

print "Complete"

1 个答案:

答案 0 :(得分:0)

假设您可以在星号行之间进行处理,这可能会有所帮助......

f = open('filename.csv')

chunks = []  #each chunk is a section of text that is what is between *** lines

tmp_text = ''
for line in f:
  if line.strip() == '******...***':
    if tmp_text != '': #I don't know if file starts with *** or not
      chunks.append(tmp_text)
      tmp_text = ''
  else:
    tmp_text += line
if tmp_text != '':
  chunks.append(tmp_text) #in case the file does not end in ****
f.close()

#chunks will be in the order that you expect them.
for chunk in chunks:
  call_your_function_to_parse_chunk_of_text_here(chunk)