我需要修改一个小的Python脚本,因为metrics文件的格式稍有改变。我根本不懂Python,并试图自己努力修复它。这些变化对我来说很有意义,但显然脚本仍有一个问题。否则,其他一切正常。这是脚本的样子:
import sys
import datetime
##########################################################################
now = datetime.datetime.now();
logFile = now.strftime("%Y%m%d")+'.QE-Metric.log';
underlyingParse = True;
strParse = "UNDERLYING_TICK";
if (len(sys.argv) == 2):
if sys.argv[1] == '2':
strParse = "ORDER_SHOOT";
underlyingParse = False;
elif (len(sys.argv) == 3):
logFile = sys.argv[2];
if sys.argv[1] == '2':
strParse = "ORDER_SHOOT";
underlyingParse = False;
else:
print 'Incorrect number of arguments. Usage: <exec> <mode (1) Underlying (2) OrderShoot> <FileName (optional)>'
sys.exit()
##########################################################################
# Read the deployment file
FIput = open(logFile, 'r');
FOput = open('ParsedMetrics.txt', 'w');
##########################################################################
def ParseMetrics( file_lines ):
ii = 0
tokens = [];
for ii in range(len(file_lines)):
line = file_lines[ii].strip()
if (line.find(strParse) != -1):
tokens = line.split(",");
currentTime = float(tokens[2])
if (underlyingParse == True and ii != 0):
newIndex = ii-1
prevLine = file_lines[newIndex].strip()
while (prevLine.find("ORDER_SHOOT") != -1 and newIndex > -1):
newIndex -= 1;
tokens = prevLine.split(",");
currentTime -= float(tokens[2]);
prevLine = file_lines[newIndex].strip();
if currentTime > 0:
FOput.write(str(currentTime) + '\n')
##########################################################################
file_lines = FIput.readlines()
ParseMetrics( file_lines );
print 'Metrics parsed and written to ParsedMetrics.txt'
一切正常,除了因为上次发生UNDERLYING_TICK事件后应该反向迭代前一行以加起ORDER_SHOOT数字的逻辑(从代码开始:if(underlyingParse == True和ii!= 0) :...)然后从当前处理的UNDERLYING_TICK事件行中减去该总数。这就是正在解析的文件中的典型行:
08:40:02.039387(+26): UNDERLYING_TICK, 1377, 1499.89
基本上,我只对最后一个数据元素(1499.89)感兴趣,这是微观时间。我知道它必须是愚蠢的东西。我只需要另一双眼睛。谢谢!
答案 0 :(得分:0)
目前还不清楚您的输出有什么问题,因为您没有显示输出,我们无法理解您的输入。
我假设以下内容:
sort
的管道输出)执行此操作,或使用bisect
模块存储已排序的行并轻松提取相关行。如果这两个假设都为真,请查看以下脚本。 (未经测试,因为我没有大的输入样本或输出样本来进行比较。)
这是一个更加Pythonic的样式,更容易阅读和理解,不使用全局变量作为函数参数,并且应该更高效,因为它不会向后遍历行或加载整个文件进入内存来解析它。
它还演示了如何使用argparse
module进行命令行解析。这不是必需的,但是如果你有很多命令行Python脚本,你应该熟悉它。
import sys
VALIDTYPES = ['UNDERLYING_TICK','ORDER_SHOOT']
def parseLine(line):
# format of `tokens`:
# 0 = absolute timestamp
# 1 = event type
# 2 = ???
# 3 = timedelta (microseconds)
tokens = [t.strip(':, \t') for t in line.strip().split()]
if tokens[1] not in VALIDTYPES:
return None
tokens[2] = int(tokens[2])
tokens[3] = float(tokens[3])
return tuple(tokens)
def parseMetrics(lines, parsetype):
"""Yield timedelta for each line of specified type
If parsetype is 'UNDERLYING_TICK', subtract previous ORDER_SHOOT
timedeltas from the current UNDERLYING_TICK delta before yielding
"""
order_shoots_between_ticks = []
for line in lines:
tokens = parseLine(line)
if tokens is None:
continue # go home early
if parsetype=='UNDERLYING_TICK':
if tokens[1]=='ORDER_SHOOT':
order_shoots_between_ticks.append(tokens)
elif tokens[1]=='UNDERLYING_TICK':
adjustedtick = tokens[3] - sum(t[3] for t in order_shoots_between_ticks)
order_shoots_between_ticks = []
yield adjustedtick
elif parsetype==tokens[1]:
yield tokens[3]
def parseFile(instream, outstream, parsetype):
printablelines = ("{0:f}\n".format(time) for time in parseMetrics(instream, parsetype))
outstream.writelines(printablelines)
def main(argv):
import argparse, datetime
parser = argparse.ArgumentParser(description='Output timedeltas from a QE-Metric log file')
parser.add_argument('mode', type=int, choices=range(1, len(VALIDTYPES)+1),
help="the types to parse. Valid values are: 1 (Underlying), 2 (OrderShoot)")
parser.add_argument('infile', required=False,
default='{}.QE-Metric.log'.format(datetime.datetime.now().strftime('%Y%m%d'))
help="the input file. Defaults to today's file: YYYYMMDD.QE-Metric.log. Use - for stdin.")
parser.add_argument('outfile', required=False,
default='ParsedMetrics.txt',
help="the output file. Defaults to ParsedMetrics.txt. Use - for stdout.")
parser.add_argument('--verbose', '-v', action='store_true')
args = parser.parse_args(argv)
args.mode = VALIDTYPES[args.mode-1]
if args.infile=='-':
instream = sys.stdin
else:
instream = open(args.infile, 'rb')
if args.outfile=='-':
outstream = sys.stdout
else:
outstream = open(args.outfile, 'wb')
parseFile(instream, outstream, args.mode)
instream.close()
outstream.close()
if args.verbose:
sys.stderr.write('Metrics parsed and written to {0}\n'.format(args.outfile))
if __name__=='__main__':
main(sys.argv[1:])
答案 1 :(得分:0)
因此,如果命令行选项为2,则该函数会创建一个输出文件,其中所有行只包含输入文件中包含“order_shoot”标记的行的“时间”部分?
如果命令行选项为1,则该函数创建一个输出文件,输入文件中包含'underlying_tick'标记的每一行都有一行,除了你想要的数字是unders_tick时间值减去所有order_shoot在前面的underlying_tick值之后发生的时间值(如果这是第一个,则从文件的开头开始)?
如果这是正确的,并且所有行都是唯一的(没有重复),那么我会建议以下重写的脚本:
#### Imports unchanged.
import sys
import datetime
#### Changing the error checking to be a little simpler.
#### If the number of args is wrong, or the "mode" arg is
#### not a valid option, it will print the error message
#### and exit.
if len(sys.argv) not in (2,3) or sys.argv[2] not in (1,2):
print 'Incorrect arguments. Usage: <exec> <mode (1) Underlying (2) OrderShoot> <FileName (optional)>'
sys.exit()
#### the default previously specified in the original code.
now = datetime.datetime.now()
#### Using ternary logic to set the input file to either
#### the files specified in argv[2] (if it exists), or to
#### the default previously specified in the original code.
FIput = open((sys.argv[2] if len(sys.argv)==3
else now.strftime("%Y%m%d")+'.QE-Metric.log'), 'r');
#### Output file not changed.
FOput = open('ParsedMetrics.txt', 'w');
#### START RE-WRITTEN FUNCTION
def ParseMetrics(file_lines,mode):
#### The function now takes two params - the lines from the
#### input file, and the 'mode' - whichever the user selected
#### at run-time. As you can see from the call down below, this
#### is taken straight from argv[1].
if mode == '1':
#### So if we're doing underlying_tick mode, we want to find each tick,
#### then for each tick, sum the preceding order_shoots since the last
#### tick (or start of file for the first tick).
ticks = [file_lines.index(line) for line in file_lines \
if 'UNDERLYING_TICK' in line]
#### The above list comprehension iterates over file_lines, and creates
#### a list of the indexes to file_lines elements that contain ticks.
####
#### Then the following loop iterates over ticks, and for each tick,
#### subtracts the sum of all times for order_shoots that occure prior
#### to the tick, from the time value of the tick itself. Then that
#### value is written to the outfile.
for tick in ticks:
sub_time = float(file_lines[tick].split(",")[2]) - \
sum([float(line.split(",")[2]) \
for line in file_lines if "ORDER_SHOOT" in line \
and file_lines.index(line) <= tick]
FOput.write(float(line.split(",")[2]))
#### if the mode is 2, then it just runs through file_lines and
#### outputs all of the order_shoot time values.
if mode == '2':
for line in file_lines:
if 'ORDER_SHOOT' in line:
FOput.write(float(line.split(",")[2]))
#### END OF REWRITTEN FUNCTION
#### As you can see immediately below, we pass sys.argv[2] for the
#### mode argument of the ParseMetrics function.
ParseMetrics(FIput.readlines(),sys.argv[2])
print 'Metrics parsed and written to ParsedMetrics.txt'
这应该可以解决问题。主要问题是如果你有任何“UNDERLYING_TICK”的行与任何其他这样的行完全重复,那么这将不起作用。需要使用不同的逻辑来获得正确的索引。
我相信有一种方法可以让这更好,但这是我的第一个想法。
值得注意的是,为了便于阅读,我在上面的源代码中添加了许多内联换行符,但是如果你按照书面形式使用它,你可能想要提取它们。