Question

我需要修改一个小的Python脚本，因为metrics文件的格式稍有改变。我根本不懂Python，并试图自己努力修复它。这些变化对我来说很有意义，但显然脚本仍有一个问题。否则，其他一切正常。这是脚本的样子：

import sys
import datetime

##########################################################################

now = datetime.datetime.now();
logFile = now.strftime("%Y%m%d")+'.QE-Metric.log';

underlyingParse = True;
strParse = "UNDERLYING_TICK";
if (len(sys.argv) == 2):
    if sys.argv[1] == '2':
    strParse = "ORDER_SHOOT";
        underlyingParse = False;
elif (len(sys.argv) == 3):
    logFile = sys.argv[2];    
    if sys.argv[1] == '2':
    strParse = "ORDER_SHOOT";
        underlyingParse = False;
else:
    print 'Incorrect number of arguments. Usage: <exec> <mode (1) Underlying (2) OrderShoot> <FileName (optional)>'
    sys.exit()

##########################################################################

# Read the deployment file
FIput = open(logFile, 'r');
FOput = open('ParsedMetrics.txt', 'w');

##########################################################################

def ParseMetrics( file_lines ):

    ii = 0
    tokens = []; 
    for ii in range(len(file_lines)):

        line = file_lines[ii].strip()

        if (line.find(strParse) != -1):

             tokens = line.split(",");
             currentTime = float(tokens[2])

             if (underlyingParse == True and ii != 0):
                 newIndex = ii-1
                 prevLine = file_lines[newIndex].strip()
                 while (prevLine.find("ORDER_SHOOT") != -1 and newIndex > -1):
                     newIndex -= 1;
                     tokens = prevLine.split(",");
                     currentTime -= float(tokens[2]);
                     prevLine = file_lines[newIndex].strip();

         if currentTime > 0:
                 FOput.write(str(currentTime) + '\n')

##########################################################################

file_lines = FIput.readlines()
ParseMetrics( file_lines );

print 'Metrics parsed and written to ParsedMetrics.txt'

一切正常，除了因为上次发生UNDERLYING_TICK事件后应该反向迭代前一行以加起ORDER_SHOOT数字的逻辑（从代码开始：if（underlyingParse == True和ii！= 0）：...）然后从当前处理的UNDERLYING_TICK事件行中减去该总数。这就是正在解析的文件中的典型行：

08:40:02.039387(+26): UNDERLYING_TICK, 1377, 1499.89

基本上，我只对最后一个数据元素（1499.89）感兴趣，这是微观时间。我知道它必须是愚蠢的东西。我只需要另一双眼睛。谢谢！

Answer 1

目前还不清楚您的输出有什么问题，因为您没有显示输出，我们无法理解您的输入。

我假设以下内容：

行被格式化为“absolutetime：TYPE，positiveinteger，float_time_duration_in_ms”，其中最后一项是事物所花费的时间。
行按“absolutetime”排序。因此，属于UNDERLYING_TICK的ORDER_SHOOT始终位于自上一个UNDERLYING_TICK（或文件的开头）以来的行上，而仅这些行。如果此假设不为true，则需要先对文件进行排序。您可以使用单独的程序（例如sort的管道输出）执行此操作，或使用bisect模块存储已排序的行并轻松提取相关行。

如果这两个假设都为真，请查看以下脚本。（未经测试，因为我没有大的输入样本或输出样本来进行比较。）

这是一个更加Pythonic的样式，更容易阅读和理解，不使用全局变量作为函数参数，并且应该更高效，因为它不会向后遍历行或加载整个文件进入内存来解析它。

它还演示了如何使用argparse module进行命令行解析。这不是必需的，但是如果你有很多命令行Python脚本，你应该熟悉它。

import sys

VALIDTYPES = ['UNDERLYING_TICK','ORDER_SHOOT']

def parseLine(line):
    # format of `tokens`:
    # 0 = absolute timestamp
    # 1 = event type
    # 2 = ???
    # 3 = timedelta (microseconds)
    tokens = [t.strip(':, \t') for t in line.strip().split()]
    if tokens[1] not in VALIDTYPES:
        return None
    tokens[2] = int(tokens[2])
    tokens[3] = float(tokens[3])
    return tuple(tokens)

def parseMetrics(lines, parsetype):
    """Yield timedelta for each line of specified type

    If parsetype is 'UNDERLYING_TICK', subtract previous ORDER_SHOOT 
    timedeltas from the current UNDERLYING_TICK delta before yielding
    """
    order_shoots_between_ticks = []
    for line in lines:
        tokens = parseLine(line)
        if tokens is None:
            continue # go home early
        if parsetype=='UNDERLYING_TICK':
            if tokens[1]=='ORDER_SHOOT':
                order_shoots_between_ticks.append(tokens)
            elif tokens[1]=='UNDERLYING_TICK':
                adjustedtick = tokens[3] - sum(t[3] for t in order_shoots_between_ticks)
                order_shoots_between_ticks = []
                yield adjustedtick
        elif parsetype==tokens[1]:
            yield tokens[3]

def parseFile(instream, outstream, parsetype):
    printablelines = ("{0:f}\n".format(time) for time in parseMetrics(instream, parsetype))
    outstream.writelines(printablelines)

def main(argv):
    import argparse, datetime
    parser = argparse.ArgumentParser(description='Output timedeltas from a QE-Metric log file')
    parser.add_argument('mode', type=int, choices=range(1, len(VALIDTYPES)+1),
        help="the types to parse. Valid values are: 1 (Underlying), 2 (OrderShoot)")
    parser.add_argument('infile', required=False,
        default='{}.QE-Metric.log'.format(datetime.datetime.now().strftime('%Y%m%d'))
        help="the input file. Defaults to today's file: YYYYMMDD.QE-Metric.log. Use - for stdin.")
    parser.add_argument('outfile', required=False,
        default='ParsedMetrics.txt',
        help="the output file. Defaults to ParsedMetrics.txt. Use - for stdout.")
    parser.add_argument('--verbose', '-v', action='store_true')
    args = parser.parse_args(argv)

    args.mode = VALIDTYPES[args.mode-1]

    if args.infile=='-':
        instream = sys.stdin
    else:
        instream = open(args.infile, 'rb')

    if args.outfile=='-':
        outstream = sys.stdout
    else:
        outstream = open(args.outfile, 'wb')

    parseFile(instream, outstream, args.mode)

    instream.close()
    outstream.close()

    if args.verbose:
        sys.stderr.write('Metrics parsed and written to {0}\n'.format(args.outfile))



if __name__=='__main__':
    main(sys.argv[1:])

Answer 2

因此，如果命令行选项为2，则该函数会创建一个输出文件，其中所有行只包含输入文件中包含“order_shoot”标记的行的“时间”部分？

如果命令行选项为1，则该函数创建一个输出文件，输入文件中包含'underlying_tick'标记的每一行都有一行，除了你想要的数字是unders_tick时间值减去所有order_shoot在前面的underlying_tick值之后发生的时间值（如果这是第一个，则从文件的开头开始）？

如果这是正确的，并且所有行都是唯一的（没有重复），那么我会建议以下重写的脚本：

#### Imports unchanged.

import sys 
import datetime 

#### Changing the error checking to be a little simpler.
#### If the number of args is wrong, or the "mode" arg is
#### not a valid option, it will print the error message
#### and exit.

if len(sys.argv) not in (2,3) or sys.argv[2] not in (1,2):
    print 'Incorrect arguments. Usage: <exec> <mode (1) Underlying (2) OrderShoot> <FileName (optional)>'
    sys.exit()  

#### the default previously specified in the original code.

now = datetime.datetime.now()

#### Using ternary logic to set the input file to either
#### the files specified in argv[2] (if it exists), or to
#### the default previously specified in the original code.

FIput = open((sys.argv[2] if len(sys.argv)==3 
                          else now.strftime("%Y%m%d")+'.QE-Metric.log'), 'r');

#### Output file not changed.

FOput = open('ParsedMetrics.txt', 'w');

#### START RE-WRITTEN FUNCTION

def ParseMetrics(file_lines,mode): 

#### The function now takes two params - the lines from the 
#### input file, and the 'mode' - whichever the user selected
#### at run-time. As you can see from the call down below, this
#### is taken straight from argv[1]. 

    if mode == '1':

#### So if we're doing underlying_tick mode, we want to find each tick,
#### then for each tick, sum the preceding order_shoots since the last
#### tick (or start of file for the first tick).

        ticks = [file_lines.index(line) for line in file_lines \
                                        if 'UNDERLYING_TICK' in line]

#### The above list comprehension iterates over file_lines, and creates
#### a list of the indexes to file_lines elements that contain ticks.
#### 
#### Then the following loop iterates over ticks, and for each tick,
#### subtracts the sum of all times for order_shoots that occure prior
#### to the tick, from the time value of the tick itself. Then that
#### value is written to the outfile.

        for tick in ticks:
            sub_time = float(file_lines[tick].split(",")[2]) - \
                       sum([float(line.split(",")[2]) \ 
                       for line in file_lines if "ORDER_SHOOT" in line \
                       and file_lines.index(line) <= tick]
            FOput.write(float(line.split(",")[2]))    

#### if the mode is 2, then it just runs through file_lines and
#### outputs all of the order_shoot time values.

    if mode == '2':
        for line in file_lines:
            if 'ORDER_SHOOT' in line:
                FOput.write(float(line.split(",")[2]))

#### END OF REWRITTEN FUNCTION

#### As you can see immediately below, we pass sys.argv[2] for the
#### mode argument of the ParseMetrics function.

ParseMetrics(FIput.readlines(),sys.argv[2])

print 'Metrics parsed and written to ParsedMetrics.txt'

这应该可以解决问题。主要问题是如果你有任何“UNDERLYING_TICK”的行与任何其他这样的行完全重复，那么这将不起作用。需要使用不同的逻辑来获得正确的索引。

我相信有一种方法可以让这更好，但这是我的第一个想法。

值得注意的是，为了便于阅读，我在上面的源代码中添加了许多内联换行符，但是如果你按照书面形式使用它，你可能想要提取它们。

Python：解析度量数据的简单脚本

2 个答案: