我想从二进制(“。exe”)文件中获取Unicode字符串。
http://i45.tinypic.com/23u61ie.png
当我使用这样的代码时:
`unicode_str = re.compile( u'[\u0020-\u007e]{1,}',re.UNICODE )`
它可以工作,但它只返回分隔符号, 所以当我尝试将量词改为3时:
的Python:
unicode_str = re.compile( u'[\u0020-\u007e]{3,}',re.UNICODE )
的Perl:
my @a = ( $file =~ /[\x{0020}-\x{007e}]{3,}/gs );
我只获得ASCII符号,所有Unicode符号都消失了。
我在哪里犯了错误,或者我对Unicode不了解?
评论中的代码:
的Python:
File = open( sys.argv[1], "rb" )
FileData = File.read()
File.close()
unicode_str = re.compile( u'[\u0020-\u007e]{3,}',re.UNICODE )
myList = unicode_str.findall(FileData)
for p in myList:
print p
的Perl:
$/ = "newline separator";
my $input = shift;
open( File, $input );
my $file = <File>;
close( File );
my @a = ( $file =~ /[\x{0020}-\x{007e}]{3,}/gs );
foreach ( @a ) { print "$_\n"; }
答案 0 :(得分:3)
有人已经编写了一个可以满足您需求的实用程序:
http://technet.microsoft.com/en-us/sysinternals/bb897439.aspx
usage: strings [-a] [-f offset] [-b bytes] [-n length] [-o] [-q] [-s] [-u] <file or directory>
Strings takes wild-card expressions for file names, and additional command line parameters are defined as follows:
-a Ascii-only search (Unicode and Ascii is default)
-b Bytes of file to scan
-f File offset at which to start scanning.
-o Print offset in file string was located
-n Minimum string length (default is 3)
-q Quiet (no banner)
-s Recurse subdirectories
-u Unicode-only search (Unicode and Ascii is default)
To search one or more files for the presence of a particular string using strings use a command like this:
strings * | findstr /i TextToSearchFor
修改强>
如果您想在Python中实现它,请尝试此操作,但您必须确定要查找的Unicode字符范围并将其作为UTF-16LE进行搜索。许多字符对看起来像有效的可打印Unicode。我不知道strings
使用什么算法
import re
data = open('c:/users/metolone/util/windiff.exe','rb').read()
# Search for printable ASCII characters encoded as UTF-16LE.
pat = re.compile(ur'(?:[\x20-\x7E][\x00]){3,}')
words = [w.decode('utf-16le') for w in pat.findall(data)]
for w in words:
print w
答案 1 :(得分:0)
use Win32::Exe;
my $exe = Win32::Exe->new('foo.exe');
my $inforef = $exe->get_version_info;
printf "%s: %s\n", $_, $inforef->{$_} for qw(Comments CompanyName
FileDescription FileVersion InternalName LegalCopyright
LegalTrademarks OriginalFilename ProductName ProductVersion);
当您处理通用UTF16-BE数据时,请使用Encode库:
use Encode qw(decode encode);
my $octets = # extracted from the exe
"\x00\x73\x00\x6f\x00\x66\x00\x74\x00\x20\x00\x43\x00\x6f" .
"\x00\x70\x00\x6f\x00\x72\x00\x61\x00\x74\x00\x69\x00\x6f";
my $characters = decode 'UTF16-BE', $octets, Encode::FB_CROAK;
# 'soft Coporatio'