从二进制文件中获取JPEG的图像大小

时间:2010-03-25 17:21:10

标签: file binary jpeg

我有很多不同图像大小的jpeg文件。例如,这是大小为256 * 384(像素)的图像的hexdump给出的前640个字节:

0000000: ffd8 ffe0 0010 4a46 4946 0001 0101 0048  ......JFIF.....H
0000010: 0048 0000 ffdb 0043 0003 0202 0302 0203  .H.....C........
0000020: 0303 0304 0303 0405 0805 0504 0405 0a07  ................
0000030: 0706 080c 0a0c 0c0b 0a0b 0b0d 0e12 100d  ................

我猜大小信息应该在这些行中。但我无法看到哪些字节正确地给出了大小。任何人都可以帮我找到包含大小信息的字段吗?

5 个答案:

答案 0 :(得分:10)

根据Syntax and structureJPEG page on wikipedia部分,图像的宽度和高度似乎并未存储在图像本身中 - 或者至少不是以图像本身的形式存储的很容易找到。


仍然引用JPEG image compression FAQ, part 1/2

  

主题:[22]我的程序如何从JPEG中提取图像尺寸   文件吗

     

JPEG文件的标题包含   一系列块,称为“标记”。   存储图像高度和宽度   在SOFn类型的标记中 (Start Of   框架,类型N)
找到SOFn   你必须跳过前面的内容   标记;你不必知道是什么   只是在其他类型的标记中   用他们的长度词来跳过   他们。
所需的最小逻辑是   也许是一段C代码。
(有些   人们建议只搜索   对于代表SOFn的字节对,   没有注意标记   块结构。这是不安全的   因为先前的标记可能包含   SOFn模式,无论是偶然还是偶然   因为它包含JPEG压缩   缩略图。如果你不遵循   您将检索的标记结构   缩略图的大小而不是   主要图像尺寸。)
大量的   C中的评论示例可以在中找到   IJG发行版中的rdjpgcom.c   (见第2部分,第15项)。
Perl代码   可以在wwwis中找到   http://www.tardis.ed.ac.uk/~ark/wwwis/

(呃,这个链接好像坏了......)


以下是可以帮助您的C代码的一部分:Decoding the width and height of a JPEG (JFIF) file

答案 1 :(得分:2)

此功能将读取JPEG属性

function jpegProps(data) {          // data is an array of bytes
    var off = 0;
    while(off<data.length) {
        while(data[off]==0xff) off++;
        var mrkr = data[off];  off++;

        if(mrkr==0xd8) continue;    // SOI
        if(mrkr==0xd9) break;       // EOI
        if(0xd0<=mrkr && mrkr<=0xd7) continue;
        if(mrkr==0x01) continue;    // TEM

        var len = (data[off]<<8) | data[off+1];  off+=2;  

        if(mrkr==0xc0) return {
            bpc : data[off],     // precission (bits per channel)
            w   : (data[off+1]<<8) | data[off+2],
            h   : (data[off+3]<<8) | data[off+4],
            cps : data[off+5]    // number of color components
        }
        off+=len-2;
    }
}

答案 2 :(得分:0)

如果您使用的是Linux系统,并且手边有PHP,则此php脚本的变体可能会产生您想要的结果:

#! /usr/bin/php -q
<?php

if (file_exists($argv[1]) ) {

    $targetfile = $argv[1];

    // get info on uploaded file residing in the /var/tmp directory:
    $safefile       = escapeshellcmd($targetfile);
    $getinfo        = `/usr/bin/identify $safefile`;
    $imginfo        = preg_split("/\s+/",$getinfo);
    $ftype          = strtolower($imginfo[1]);
    $fsize          = $imginfo[2];

    switch($fsize) {
        case 0:
            print "FAILED\n";
            break;
        default:
            print $safefile.'|'.$ftype.'|'.$fsize."|\n";
    }
}

// eof

主机> imageinfo 009140_DJI_0007.JPG

009140_DJI_0007.JPG | jpeg | 4000x3000 |

(以管道分隔格式输出文件名,文件类型,文件尺寸)

在手册页中:

有关“ identify”命令的更多信息,请将浏览器指向[...] http://www.imagemagick.org/script/identify.php

答案 3 :(得分:0)

我已将CPP代码从最上面的答案转换为python脚本。

"""
Source: https://stackoverflow.com/questions/2517854/getting-image-size-of-jpeg-from-its-binary#:~:text=The%20header%20of%20a%20JPEG,Of%20Frame%2C%20type%20N).
"""
def get_jpeg_size(data):
   """
   Gets the JPEG size from the array of data passed to the function, file reference: http:#www.obrador.com/essentialjpeg/headerinfo.htm
   """
   data_size=len(data)
   #Check for valid JPEG image
   i=0   # Keeps track of the position within the file
   if(data[i] == 0xFF and data[i+1] == 0xD8 and data[i+2] == 0xFF and data[i+3] == 0xE0): 
   # Check for valid JPEG header (null terminated JFIF)
      i += 4
      if(data[i+2] == ord('J') and data[i+3] == ord('F') and data[i+4] == ord('I') and data[i+5] == ord('F') and data[i+6] == 0x00):
         #Retrieve the block length of the first block since the first block will not contain the size of file
         block_length = data[i] * 256 + data[i+1]
         while (i<data_size):
            i+=block_length               #Increase the file index to get to the next block
            if(i >= data_size): return False;   #Check to protect against segmentation faults
            if(data[i] != 0xFF): return False;   #Check that we are truly at the start of another block
            if(data[i+1] == 0xC0):          #0xFFC0 is the "Start of frame" marker which contains the file size
               #The structure of the 0xFFC0 block is quite simple [0xFFC0][ushort length][uchar precision][ushort x][ushort y]
               height = data[i+5]*256 + data[i+6];
               width = data[i+7]*256 + data[i+8];
               return height, width
            else:
               i+=2;                              #Skip the block marker
               block_length = data[i] * 256 + data[i+1]   #Go to the next block
         return False                   #If this point is reached then no size was found
      else:
         return False                  #Not a valid JFIF string
   else:
      return False                     #Not a valid SOI header




with open('path/to/file.jpg','rb') as handle:
   data = handle.read()

h, w = get_jpeg_size(data)
print(s)

答案 4 :(得分:0)

这就是我使用js实现此方法的方式。您要查找的标记是Sofn标记,伪代码基本上是:

  • 从第一个字节开始
  • 段的开头始终为FF,后跟另一个指示标记类型的字节(这2个字节称为标记)
  • 如果其他字节是01D1D9,则该段中没有数据,因此请继续下一个段
  • 如果该标记是C0C2(或任何其他C n ,在代码注释中有更多详细信息),那就是您要查找的Sofn标记对于
    • 标记后面的字节分别为 P(1个字节),L(2个字节),高度(2个字节),宽度(2个字节)
  • 否则,接下来的两个字节将是length属性(不包括标记的整个段的长度,2个字节),使用该属性跳到下一个段
  • 重复直到找到Sofn标记
function getJpgSize(hexArr) {
  let i = 0;
  let marker = '';

  while (i < hexArr.length) {
    //ff always start a marker,
    //something's really wrong if the first btye isn't ff
    if (hexArr[i] !== 'ff') {
      console.log(i);
      throw new Error('aaaaaaa');
    }

    //get the second byte of the marker, which indicates the marker type
    marker = hexArr[++i];

    //these are segments that don't have any data stored in it, thus only 2 bytes
    //01 and D1 through D9
    if (marker === '01' || (!isNaN(parseInt(marker[1])) && marker[0] === 'd')) {
      i++;
      continue;
    }

    /*
    sofn marker: https://www.w3.org/Graphics/JPEG/itu-t81.pdf pg 36
      INFORMATION TECHNOLOGY –
      DIGITAL COMPRESSION AND CODING
      OF CONTINUOUS-TONE STILL IMAGES –
      REQUIREMENTS AND GUIDELINES

    basically, sofn (start of frame, type n) segment contains information
    about the characteristics of the jpg

    the marker is followed by:
      - Lf [frame header length], two bytes
      - P [sample precision], one byte
      - Y [number of lines in the src img], two bytes, which is essentially the height
      - X [number of samples per line], two bytes, which is essentially the width 
      ... [other parameters]

    sofn marker codes: https://www.digicamsoft.com/itu/itu-t81-36.html
    apparently there are other sofn markers but these two the most common ones
    */
    if (marker === 'c0' || marker === 'c2') {
      break;
    }
    //2 bytes specifying length of the segment (length excludes marker)
    //jumps to the next seg
    i += parseInt(hexArr.slice(i + 1, i + 3).join(''), 16) + 1;
  }
  const size = {
    height: parseInt(hexArr.slice(i + 4, i + 6).join(''), 16),
    width: parseInt(hexArr.slice(i + 6, i + 8).join(''), 16),
  };
  return size;
}