Question

我有兴趣构建一个python脚本，它可以为我提供每个间隔（可能是分钟）写入文件的行数。我有正在写入数据的文件，每个用户通过外部程序传递数据的新行。了解每个x的行数为我提供了一个可用于未来扩展规划的指标。输出文件由行组成，所有行的长度都相同，并且最后都有行返回。我正在考虑编写一个类似的脚本：在特定点测量文件的长度，然后在将来的另一个点再次测量它，减去两个并得到我的结果......但是我不知道如果这是理想的，因为它需要时间来测量文件的长度，这可能会扭曲我的结果。有没有人有其他想法？

根据人们的说法，我把它们放在一起开始：

import os
import subprocess
import time
from daemon import runner
#import daemon

inputfilename="/home/data/testdata.txt"

class App():
    def __init__(self):
        self.stdin_path = '/dev/null'
        self.stdout_path = '/dev/tty'
        self.stderr_path = '/dev/tty'
        self.pidfile_path =  '/tmp/count.pid'
        self.pidfile_timeout = 5
    def run(self):
        while True:
            count = 0

            FILEIN = open(inputfilename, 'rb')
            while 1:
              buffer = FILEIN.read(8192*1024)
              if not buffer: break
              count += buffer.count('\n')
            FILEIN.close(  )
            print count
            # set the sleep time for repeated action here:
            time.sleep(60)

app = App()
daemon_runner = runner.DaemonRunner(app)
daemon_runner.do_action()

它每隔60秒完成一次计数并将其打印到屏幕上，我的下一步就是数学运算。

再一次编辑：我已经以一分钟的间隔添加了计数的输出：

import os
import subprocess
import time
from daemon import runner
#import daemon

inputfilename="/home/data/testdata.txt"


class App():
    def __init__(self):
        self.stdin_path = '/dev/null'
        self.stdout_path = '/dev/tty'
        self.stderr_path = '/dev/tty'
        self.pidfile_path =  '/tmp/twitter_counter.pid'
        self.pidfile_timeout = 5
    def run(self):
        counter1 = 0
        while True:
            count = 0

            FILEIN = open(inputfilename, 'rb')
            while 1:
              buffer = FILEIN.read(8192*1024)
              if not buffer: break
              count += buffer.count('\n')
            FILEIN.close(  )

            print count - counter1

            counter1 = count
            # set the sleep time for repeated action here:
            time.sleep(60)

app = App()
daemon_runner = runner.DaemonRunner(app)
daemon_runner.do_action()

Answer 1

评论你的想法（这对我来说似乎很合理），你需要多大准确的测量结果？

我建议先测量测量时间。然后，给定您想要达到的相对精度，您可以计算连续测量之间的时间间隔，例如，如果测量值 t 毫秒并且您希望获得1％的准确度，则不要在 100t 中测量多于一次的次数毫秒。

尽管测量时间会随着文件的增长而增长，但您必须牢记这一点。

提示如何计算文件中的行：is there a built-in python analog to unix 'wc' for sniffing a file?

提示如何衡量时间：time模块。

P.S。我刚尝试在245M文件上计时行计数器。第一次花了大约10秒钟（没有在第一次运行时间）但是它总是低于1秒。也许在那里进行了一些缓存，我不确定。

Python计算每秒/分钟写入文件的行数

1 个答案: