reverse / stride / pread和pwrite的典型应用是什么?

时间:2012-06-05 13:00:42

标签: filesystems benchmarking disk interpretation iozone

如果不耐烦,请跳到"问题"标题如下。

CONTEXT

我使用Unix(如)系统管理和基础架构开发,但我认为程序员最好回答我的问题:o)

我想要做的是学习如何使用iozone对文件系统(普通,卷管理,虚拟化,加密等)进行基准测试。作为练习,我对一个USB pendrive进行了基准测试,该spndrive用作我的slug(http://www.nslu2-linux.org/)中的系统盘,分别用vfat,ntfs,ext3,ext4和xfs格式化。该测试产生了一些令人惊讶的结果,如下所示。然而,结果让我感到惊讶的原因很可能是因为我仍然是iozone的新手并且不知道如何解释这些数字。因此,这篇文章。

在我的测试中,iozone在11个不同的文件操作上运行基准测试,但仅在一个记录大小(4k,与所有测试文件系统的块大小匹配)上运行,并且仅在一个文件大小(512MB)上运行。文件系统记录大小和文件大小的片面性当然会使测试产生一些偏差。无论如何,下面列出了文件操作,每个操作都有我自己的简短说明:

  • 初始写入:按顺序将新数据写入磁盘,定期使用文件
  • 重写:将新数据附加到现有的顺序,常规文件使用
  • 读取:按顺序读取数据,定期使用文件
  • 重读:顺序重新读取数据(缓冲区测试,还是什么?)
  • 反向阅读:???
  • 大步读:???
  • 随机读取:非顺序读取,通常是数据库使用
  • 随机写入:非顺序写入,通常是数据库使用
  • pread:读取某个位置的数据 - 用于索引数据库?
  • pwrite:在某个位置写入数据 - 用于索引数据库?
  • 混合工作量:(显而易见)

其中一些操作似乎很简单。我想初始写入,重写和读取都用于常规文件处理,包括让指针搜索直到达到某个块,顺序读取或写入(通常通过多个块),有时因为碎片而不得不向前跳一点文件。重新读取测试(我猜)的唯一目标是缓冲测试。并行地,随机读/写是典型的数据库操作,其中指针必须在收集数据库记录的同一文件中从一个地方跳到另一个地方,例如在连接表时。

那是什么问题?

到目前为止,这么好。我非常感谢对上述假设的任何更正,尽管它们似乎是相当常见的知识。现在提出真正的问题:你为什么要做反向阅读?什么是大步读? "位置"我被告知,操作pread和pwrite与索引数据库一起使用,但为什么不简单地将索引保存在内存中?或者是实际发生的事情,然后在给定某个索引时,pread会派上用场跳转到记录的确切位置?还有什么用pread / pwrite?

总结一下,到目前为止,我觉得我只能稍微解释一下我的iozone结果。我或多或少知道为什么随机操作的高数字会为数据库创建一个好的文件系统,但为什么我需要以相反的顺序读取文件,一个好的步幅告诉我什么?这些操作的典型应用程序将使用什么?

奖金问题

问过这个,这是一个奖金问题。作为给定文件系统的管理员,我非常感谢如何从富有洞察力的程序员那里解释我的文件系统基准测试;) - 是否有人建议如何制作文件系统实际使用的分析?尽管耗时,但尝试文件系统记录(块)大小是微不足道的。关于给定文件系统中文件的大小和分布,找到'是我的朋友。但是,我该如何计算实际的文件系统调用,如read(),pwrite()等?

另外,我非常感谢其他资源对文件系统测试结果的影响,例如处理器能力和RAM容量和速度的作用。例如,当我想在带有266 MHz ARM Intel XScale处理器和32/8的slug中使用pendrive时,我在装有1.66Ghz Atom处理器和2 GB DDR2 RAM的机器上进行此测试有什么不同? MB SD /闪存RAM?

建筑风格的文件?

由于我不想过多地重复自己,我也不想向其他人提问,所以,如果这些问题无法在短时间内得到解答,我将非常感谢文档,重要的不是它解释了上面的文件操作实际上做了什么(我可以看看API),但是这个文档具有架构意义,即它解释了这些操作通常如何在实际中使用生活应用。

测试结果

右。我答应了我相当简陋的USB pendrive文件系统测试的结果。我的主要期望是写入结果通常很差(作为闪存驱动器,鉴于它的性质,通常具有比实际管理它的文件系统更大的块大小,这意味着写入一小部分更改相对大量的未更改数据必须被重写),并且读取结果很好。主要观点是:

  • vfat在所有操作中表现都非常出色,除了有点模糊(对我来说,无论如何)反向和跨步读取。我想缺乏功能可以消除大量的簿记。

  • ntfs糟透了重写(追加)和读取操作,使其成为常规文件操作的不良候选者。它也会使pread操作变得糟糕,使其成为索引数据库的不良候选者。

  • 令人惊讶的是,ext3和ext4,后者在所有操作上都略胜一筹,在初始写入,重写,读取,随机写入和pwrite操作方面都很糟糕,使得它们成为常规文件使用的不良候选者,以及强烈更新数据库。但是,ext4是随机读取和pread的高手,使其成为某些静态数据库(?)的绝佳选择。无论如何,ext3和ext4都在晦涩的反向读取和步幅读取操作中得分很高。

  • 无与伦比的全能测试冠军是xfs,其唯一的弱点似乎是反向阅读。在初始写入,重写,读取,随机写入和pwrite时,它是最好的,使其成为常规文件使用以及(强烈更新)数据库的理想选择。在重读时,随机阅读并使其成为亚军之一,使其成为(有些静态)数据库的良好候选者。它在步幅阅读方面也做得很好 - 无论这意味着什么!

欢迎对这些结果的解释发表任何评论!数字列在下面(由于长度的原因有些削减),一个iozone测试套件pr。文件系统类型,全部在标准的4GB Verbatim pendrive(橙色;)上进行测试,停靠在三星N105P笔记本电脑上,配备N450 1.66Ghz Atom CPU和2GB DDR2 667 Mhz RAM,运行Linux 3.2.0-24 x86内核加密交换(是的,我知道,我应该安装一个64位的Linux并保持交换明确!)。

此致 托

PS。写完之后我发现,显然,Debian NSLU2发行版不支持xfs。不过,我的问题仍然存在!

--- vfat ---

Iozone: Performance Test of File I/O
        Version $Revision: 3.397 $
    Compiled for 32 bit mode.
    Build: linux 

Contributors:William Norcott, Don Capps, Isom Crawford, Kirby Collins
             Al Slater, Scott Rhine, Mike Wisner, Ken Goss
             Steve Landherr, Brad Smith, Mark Kelly, Dr. Alain CYR,
             Randy Dunlap, Mark Montague, Dan Million, Gavin Brebner,
             Jean-Marc Zucconi, Jeff Blomberg, Benny Halevy, Dave Boone,
             Erik Habbinga, Kris Strecker, Walter Wong, Joshua Root,
             Fabrice Bacchella, Zhenghua Xue, Qin Li, Darren Sawyer.
             Ben England.

Run began: Mon Jun  4 14:23:57 2012

Record Size 4 KB
File size set to 524288 KB
Command line used: iozone -l 1 -u 1 -r 4k -s 512m -F /mnt/iozone.tmp
Output is in Kbytes/sec
Time Resolution = 0.000002 seconds.
Processor cache size set to 1024 Kbytes.
Processor cache line size set to 32 bytes.
File stride size set to 17 * record size.
Min process = 1 
Max process = 1 
Throughput test with 1 process
Each process writes a 524288 Kbyte file in 4 Kbyte records

Children see throughput for  1 initial writers  =   12864.82 KB/sec
Parent sees throughput for  1 initial writers   =    3033.39 KB/sec

Children see throughput for  1 rewriters    =   25271.86 KB/sec
Parent sees throughput for  1 rewriters     =    2876.36 KB/sec

Children see throughput for  1 readers      =  685333.00 KB/sec
Parent sees throughput for  1 readers       =  682464.06 KB/sec

Children see throughput for 1 re-readers    =  727929.94 KB/sec
Parent sees throughput for 1 re-readers     =  726612.47 KB/sec

Children see throughput for 1 reverse readers   =  458174.00 KB/sec
Parent sees throughput for 1 reverse readers    =  456910.21 KB/sec

Children see throughput for 1 stride readers    =  351768.00 KB/sec
Parent sees throughput for 1 stride readers     =  351504.09 KB/sec

Children see throughput for 1 random readers    =  553705.94 KB/sec
Parent sees throughput for 1 random readers     =  552630.83 KB/sec

Children see throughput for 1 mixed workload    =  549812.50 KB/sec
Parent sees throughput for 1 mixed workload     =  547645.03 KB/sec

Children see throughput for 1 random writers    =   19958.66 KB/sec
Parent sees throughput for 1 random writers     =    2752.23 KB/sec

Children see throughput for 1 pwrite writers    =   13355.57 KB/sec
Parent sees throughput for 1 pwrite writers     =    3119.04 KB/sec

Children see throughput for 1 pread readers     =  574273.31 KB/sec
Parent sees throughput for 1 pread readers  =  572121.97 KB/sec

--- ntfs ---

Iozone: Performance Test of File I/O
        Version $Revision: 3.397 $
    Compiled for 32 bit mode.
    Build: linux 

Contributors:William Norcott, Don Capps, Isom Crawford, Kirby Collins
             Al Slater, Scott Rhine, Mike Wisner, Ken Goss
             Steve Landherr, Brad Smith, Mark Kelly, Dr. Alain CYR,
             Randy Dunlap, Mark Montague, Dan Million, Gavin Brebner,
             Jean-Marc Zucconi, Jeff Blomberg, Benny Halevy, Dave Boone,
             Erik Habbinga, Kris Strecker, Walter Wong, Joshua Root,
             Fabrice Bacchella, Zhenghua Xue, Qin Li, Darren Sawyer.
             Ben England.

Run began: Mon Jun  4 13:59:37 2012

Record Size 4 KB
File size set to 524288 KB
Command line used: iozone -l 1 -u 1 -r 4k -s 512m -F /mnt/iozone.tmp
Output is in Kbytes/sec
Time Resolution = 0.000002 seconds.
Processor cache size set to 1024 Kbytes.
Processor cache line size set to 32 bytes.
File stride size set to 17 * record size.
Min process = 1 
Max process = 1 
Throughput test with 1 process
Each process writes a 524288 Kbyte file in 4 Kbyte records

Children see throughput for  1 initial writers  =   11153.75 KB/sec
Parent sees throughput for  1 initial writers   =    2848.69 KB/sec

Children see throughput for  1 rewriters    =    8723.95 KB/sec
Parent sees throughput for  1 rewriters     =    2794.81 KB/sec

Children see throughput for  1 readers      =   24935.60 KB/sec
Parent sees throughput for  1 readers       =   24878.74 KB/sec

Children see throughput for 1 re-readers    =  144415.05 KB/sec
Parent sees throughput for 1 re-readers     =  144340.90 KB/sec

Children see throughput for 1 reverse readers   =   76627.60 KB/sec
Parent sees throughput for 1 reverse readers    =   76362.93 KB/sec

Children see throughput for 1 stride readers    =  367293.25 KB/sec
Parent sees throughput for 1 stride readers     =  366002.25 KB/sec

Children see throughput for 1 random readers    =  505843.41 KB/sec
Parent sees throughput for 1 random readers     =  500556.16 KB/sec

Children see throughput for 1 mixed workload    =  553075.56 KB/sec
Parent sees throughput for 1 mixed workload     =  551754.97 KB/sec

Children see throughput for 1 random writers    =    9747.23 KB/sec
Parent sees throughput for 1 random writers     =    2381.89 KB/sec

Children see throughput for 1 pwrite writers    =   10906.05 KB/sec
Parent sees throughput for 1 pwrite writers     =    1931.43 KB/sec

Children see throughput for 1 pread readers     =   16730.47 KB/sec
Parent sees throughput for 1 pread readers  =   16194.80 KB/sec

--- ext3 ---

Iozone: Performance Test of File I/O
        Version $Revision: 3.397 $
    Compiled for 32 bit mode.
    Build: linux 

Contributors:William Norcott, Don Capps, Isom Crawford, Kirby Collins
             Al Slater, Scott Rhine, Mike Wisner, Ken Goss
             Steve Landherr, Brad Smith, Mark Kelly, Dr. Alain CYR,
             Randy Dunlap, Mark Montague, Dan Million, Gavin Brebner,
             Jean-Marc Zucconi, Jeff Blomberg, Benny Halevy, Dave Boone,
             Erik Habbinga, Kris Strecker, Walter Wong, Joshua Root,
             Fabrice Bacchella, Zhenghua Xue, Qin Li, Darren Sawyer.
             Ben England.

Run began: Sun Jun  3 16:05:27 2012

Record Size 4 KB
File size set to 524288 KB
Command line used: iozone -l 1 -u 1 -r 4k -s 512m -F /media/verbatim/1/iozone.tmp
Output is in Kbytes/sec
Time Resolution = 0.000001 seconds.
Processor cache size set to 1024 Kbytes.
Processor cache line size set to 32 bytes.
File stride size set to 17 * record size.
Min process = 1 
Max process = 1 
Throughput test with 1 process
Each process writes a 524288 Kbyte file in 4 Kbyte records

Children see throughput for  1 initial writers  =    3704.61 KB/sec
Parent sees throughput for  1 initial writers   =    3238.73 KB/sec

Children see throughput for  1 rewriters    =    3693.52 KB/sec
Parent sees throughput for  1 rewriters     =    3291.40 KB/sec

Children see throughput for  1 readers      =  103318.38 KB/sec
Parent sees throughput for  1 readers       =  103210.16 KB/sec

Children see throughput for 1 re-readers    =  908090.88 KB/sec
Parent sees throughput for 1 re-readers     =  906356.05 KB/sec

Children see throughput for 1 reverse readers   =  744801.38 KB/sec
Parent sees throughput for 1 reverse readers    =  743703.54 KB/sec

Children see throughput for 1 stride readers    =  623353.88 KB/sec
Parent sees throughput for 1 stride readers     =  622295.11 KB/sec

Children see throughput for 1 random readers    =  725649.06 KB/sec
Parent sees throughput for 1 random readers     =  723891.82 KB/sec

Children see throughput for 1 mixed workload    =  734631.44 KB/sec
Parent sees throughput for 1 mixed workload     =  733283.36 KB/sec

Children see throughput for 1 random writers    =     177.59 KB/sec
Parent sees throughput for 1 random writers     =     137.83 KB/sec

Children see throughput for 1 pwrite writers    =    2319.47 KB/sec
Parent sees throughput for 1 pwrite writers     =    2200.95 KB/sec

Children see throughput for 1 pread readers     =   13614.82 KB/sec
Parent sees throughput for 1 pread readers  =   13614.45 KB/sec

--- ext4 ---

Iozone: Performance Test of File I/O
        Version $Revision: 3.397 $
    Compiled for 32 bit mode.
    Build: linux 

Contributors:William Norcott, Don Capps, Isom Crawford, Kirby Collins
             Al Slater, Scott Rhine, Mike Wisner, Ken Goss
             Steve Landherr, Brad Smith, Mark Kelly, Dr. Alain CYR,
             Randy Dunlap, Mark Montague, Dan Million, Gavin Brebner,
             Jean-Marc Zucconi, Jeff Blomberg, Benny Halevy, Dave Boone,
             Erik Habbinga, Kris Strecker, Walter Wong, Joshua Root,
             Fabrice Bacchella, Zhenghua Xue, Qin Li, Darren Sawyer.
             Ben England.

Run began: Sun Jun  3 17:59:26 2012

Record Size 4 KB
File size set to 524288 KB
Command line used: iozone -l 1 -u 1 -r 4k -s 512m -F /media/verbatim/2/iozone.tmp
Output is in Kbytes/sec
Time Resolution = 0.000005 seconds.
Processor cache size set to 1024 Kbytes.
Processor cache line size set to 32 bytes.
File stride size set to 17 * record size.
Min process = 1 
Max process = 1 
Throughput test with 1 process
Each process writes a 524288 Kbyte file in 4 Kbyte records

Children see throughput for  1 initial writers  =    4086.64 KB/sec
Parent sees throughput for  1 initial writers   =    3533.34 KB/sec

Children see throughput for  1 rewriters    =    4039.37 KB/sec
Parent sees throughput for  1 rewriters     =    3409.48 KB/sec

Children see throughput for  1 readers      = 1073806.38 KB/sec
Parent sees throughput for  1 readers       = 1062541.84 KB/sec

Children see throughput for 1 re-readers    =  991162.00 KB/sec
Parent sees throughput for 1 re-readers     =  988426.34 KB/sec

Children see throughput for 1 reverse readers   =  811973.62 KB/sec
Parent sees throughput for 1 reverse readers    =  810333.28 KB/sec

Children see throughput for 1 stride readers    =  779127.19 KB/sec
Parent sees throughput for 1 stride readers     =  777359.89 KB/sec

Children see throughput for 1 random readers    =  796860.56 KB/sec
Parent sees throughput for 1 random readers     =  795138.41 KB/sec

Children see throughput for 1 mixed workload    =  741489.56 KB/sec
Parent sees throughput for 1 mixed workload     =  739544.09 KB/sec

Children see throughput for 1 random writers    =     499.05 KB/sec
Parent sees throughput for 1 random writers     =     399.82 KB/sec

Children see throughput for 1 pwrite writers    =    4092.66 KB/sec
Parent sees throughput for 1 pwrite writers     =    3451.62 KB/sec

Children see throughput for 1 pread readers     =  840101.38 KB/sec
Parent sees throughput for 1 pread readers  =  831083.31 KB/sec

--- xfs ---

Iozone: Performance Test of File I/O
        Version $Revision: 3.397 $
    Compiled for 32 bit mode.
    Build: linux 

Contributors:William Norcott, Don Capps, Isom Crawford, Kirby Collins
             Al Slater, Scott Rhine, Mike Wisner, Ken Goss
             Steve Landherr, Brad Smith, Mark Kelly, Dr. Alain CYR,
             Randy Dunlap, Mark Montague, Dan Million, Gavin Brebner,
             Jean-Marc Zucconi, Jeff Blomberg, Benny Halevy, Dave Boone,
             Erik Habbinga, Kris Strecker, Walter Wong, Joshua Root,
             Fabrice Bacchella, Zhenghua Xue, Qin Li, Darren Sawyer.
             Ben England.

Run began: Mon Jun  4 14:47:49 2012

Record Size 4 KB
File size set to 524288 KB
Command line used: iozone -l 1 -u 1 -r 4k -s 512m -F /mnt/iozone.tmp
Output is in Kbytes/sec
Time Resolution = 0.000005 seconds.
Processor cache size set to 1024 Kbytes.
Processor cache line size set to 32 bytes.
File stride size set to 17 * record size.
Min process = 1 
Max process = 1 
Throughput test with 1 process
Each process writes a 524288 Kbyte file in 4 Kbyte records

Children see throughput for  1 initial writers  =   21854.47 KB/sec
Parent sees throughput for  1 initial writers   =    3836.32 KB/sec

Children see throughput for  1 rewriters    =   29420.40 KB/sec
Parent sees throughput for  1 rewriters     =    3955.65 KB/sec

Children see throughput for  1 readers      =  624136.75 KB/sec
Parent sees throughput for  1 readers       =  614326.13 KB/sec

Children see throughput for 1 re-readers    =  577542.62 KB/sec
Parent sees throughput for 1 re-readers     =  576533.42 KB/sec

Children see throughput for 1 reverse readers   =  483368.06 KB/sec
Parent sees throughput for 1 reverse readers    =  482598.67 KB/sec

Children see throughput for 1 stride readers    =  537227.12 KB/sec
Parent sees throughput for 1 stride readers     =  536313.77 KB/sec

Children see throughput for 1 random readers    =  525219.19 KB/sec
Parent sees throughput for 1 random readers     =  524062.07 KB/sec

Children see throughput for 1 mixed workload    =  561513.50 KB/sec
Parent sees throughput for 1 mixed workload     =  560142.18 KB/sec

Children see throughput for 1 random writers    =   24118.34 KB/sec
Parent sees throughput for 1 random writers     =    3117.71 KB/sec

Children see throughput for 1 pwrite writers    =   32512.07 KB/sec
Parent sees throughput for 1 pwrite writers     =    3825.54 KB/sec

Children see throughput for 1 pread readers     =  525244.94 KB/sec
Parent sees throughput for 1 pread readers  =  523331.93 KB/sec

3 个答案:

答案 0 :(得分:3)

我只需要深入了解文件系统性能,我就是在Windows系统上。无论您使用什么操作系统/文件系统,一般原则都适用...

为什么要进行反向阅读?

当程序运行时,它会读取块987654,然后使用该数据确定它需要块123456.这可能发生在连接上:您的Db可能正在使用table1上的索引从表中选择记录(使用索引)二。拣货操作可能在表一订单中发生(与表2订单相反)。

使用两个键时,单个表选择会发生类似的情况。

什么是大步读?

阅读每第N个街区。读取块12345600然后块12345700然后块12345800是100的步幅。想象一个具有许多和/或大列的表。该表可能包含需要多个文件系统块来保存数据的行。通常,数据库会将此数据组织到每行的记录中,每条记录占用多个顺序文件系统块。如果您的数据库行占用10个文件系统块,并且您在两列上选择,则可能只需要读取该10个块记录的第1个和第6个块。您的查询将需要读取块10001,10006,10011,10016,10021,10026 - 步幅为5。

我被告知,“位置”操作pread和pwrite与索引数据库一起使用,但为什么不简单地将索引保留在内存中?

索引的大小可能超过合理的RAM使用量。或者,您之前的使用将其他索引或数据称为ram,导致未使用的索引从文件系统/数据库缓存中获得。

或者是实际发生的事情,然后pread会在给定某个索引时跳转到记录的确切位置吗?是的,这可能是您的数据库正在做的事情。

你还有什么用pread / pwrite?

某些数据文件具有预定义的“有趣”位置。这可能是B-Tree索引的根,表头,日志/日志尾部或其他东西,具体取决于您的Db实现。 pread / rwrite正在测试跳转到设定的特定位置的性能,而不是统一随机的位置组合。

<强>链接吗

对于可以捕获每个OS文件系统操作的所有主流操作系统,存在系统utilities。我认为这些可能在* NIX系统上被命名为DTRACE或pTAP或pTRACE。您可以使用这些监视器中的大量数据(智能过滤)来查看系统中的磁盘访问模式。

然后一般的经验法则是,对于Db使用,淫秽数量的RAM是有帮助的。然后,所有索引始终驻留在RAM中。

答案 1 :(得分:1)

对不起:我无法添加有关您询问的特定系统调用的信息。所以我添加一些自以为是的内容,而不是......

在我看来,iozone不是一个非常有趣的基准测试工具。我认为,分析各种系统调用也不是很有趣。

重要的是文件系统在真实世界中的运作方式。然而,使用真实场景进行基准测试可能非常耗时;例如,创建有效的测试环境可能需要很长时间。这就是基准工具确实派上用场的原因。但基准测试工具应该能够以尽可能接近实际应用的方式工作;此外,如果基准测试工具以残酷的方式工作,通常也会很好,因此可以探索所涉及的硬件/软件的限制。

满足这些要求的两个基准工具是fio和Oracle的orion。使用这两种工具,它可以相对容易地指定基准,该基准将使用合理的读写混合,并指定基准应该如何并行运行。并且可以同时执行设备级和FS级基准测试;这很好,因为有时你想要在没有特定文件系统开销的情况下对存储设备进行基准测试。与Orion相比,fio具有动态邮件列表的优势,其中有很好的答案(我还没有找到Orion的邮件列表)。

答案 2 :(得分:1)

我可以就你问题的两个部分提供一些背景知识。作为观察一些机械工程应用的I / O行为的结果,引入了“向后读取”测试。这些应用程序经常会从磁盘顺序向前然后向后读取。有推测这与(线性代数)前向和后向替换有关,或者它与依赖磁带驱动器的原始实现有关。

至于步幅访问,这是许多地震勘探应用(深度和/或时间偏移IIRC)的常见I / O模式。与“向后读取”场景的情况一样,这也是在观察这些应用程序的I / O行为之后引入的。