Question

我写了一个基本的python程序来解析android的resources.arsc。它打印出文件中找到的所有字符串。字符串在每个字符之间具有零值字节。这告诉我，字符串存储在utf-16中。我不知道这是否正确，但是android字符串是可本地化的，所以我认为它是。我正在使用string.decode（'hex'）以人类可读的格式打印字符串。这是一个包含构成字符串的字节列表的示例：

>>> print ''.join(['00', '72', '00', '65', '00', '73', '00', '2f', '00', '64', '00', '72',    '00', '61', '00', '77', '00', '61', '00', '62', '00', '6c', '00', '65', '00', '2f', '00', '61', '00', '62', '00', '6f', '00', '75', '00', '74', '00', '2e', '00', '70', '00', '6e', '00', '67', '00', '00', '00']).decode('hex')
res/drawable/about.png

问题是，当我将这个程序传递给grep时，我无法读取任何读取的字符串。如何将它打印到shell中，以便grep能够在其输出中匹配？谢谢！

（编辑）我确实打印了字符串，但在我的例子中，我认为最好同时显示'print'ed版本和返回版本。对困惑感到抱歉。在这个例子中，'/res/drawable/about.png'不能被grepped。

（EDIT2）一个简单的演示：

11:33 AM ~/learning_python $ python -c "print ''.join(['00', '72', '00', '65', '00', '73', '00', '2f', '00', '64', '00', '72',    '00', '61', '00', '77', '00', '61', '00', '62', '00', '6c', '00', '65', '00', '2f', '00', '61', '00', '62', '00', '6f', '00', '75', '00', '74', '00', '2e', '00', '70', '00', '6e', '00', '67', '00', '00', '00']).decode('hex')"
res/drawable/about.png
11:33 AM ~/learning_python $ python -c "print ''.join(['00', '72', '00', '65', '00', '73', '00', '2f', '00', '64', '00', '72',    '00', '61', '00', '77', '00', '61', '00', '62', '00', '6c', '00', '65', '00', '2f', '00', '61', '00', '62', '00', '6f', '00', '75', '00', '74', '00', '2e', '00', '70', '00', '6e', '00', '67', '00', '00', '00']).decode('hex')" | grep about
11:33 AM ~/learning_python $

（EDIT3）另一个演示，我认为这证明数据是在utf-16-be：

11:33 AM ~/learning_python $ python -c "print ''.join(['00', '72', '00', '65', '00', '73', '00', '2f', '00', '64', '00', '72',    '00', '61', '00', '77', '00', '61', '00', '62', '00', '6c', '00', '65', '00', '2f', '00', '61', '00', '62', '00', '6f', '00', '75', '00', '74', '00', '2e', '00', '70', '00', '6e', '00', '67', '00', '00', '00']).decode('hex')" > testfile
11:35 AM ~/learning_python $ iconv -f utf16be -t utf8 testfile
res/drawable/about.png
11:35 AM ~/learning_python $ iconv -f utf16be -t utf8 testfile | grep about
Binary file (standard input) matches
11:35 AM ~/learning_python $ iconv -f utf16be -t utf8 testfile | grep -a about
res/drawable/about.png

Answer 1

解码字符：

'\x00r\x00e\x00s'.decode('utf-16-be') # produces u'res'

然后你可以打印出解码后的字符串：

$ python -c "print ''.join(['00', '72', '00', '65', '00', '73', '00', '2f', '00', '64', '00', '72',    '00', '61', '00', '77', '00', '61', '00', '62', '00', '6c', '00', '65', '00', '2f', '00', '61', '00', '62', '00', '6f', '00', '75', '00', '74', '00', '2e', '00', '70', '00', '6e', '00', '67', '00', '00', '00', '00']).decode('hex').decode('utf-16-be').rstrip('\0')" | grep about
res/drawable/about.png

Answer 2

使用ripgrep utility代替可以支持UTF-16文件的grep。

ripgrep支持以UTF-8以外的文本编码搜索文件，例如UTF-16，latin-1，GBK，EUC-JP，Shift_JIS等。（提供了一些对自动检测UTF-16的支持。必须使用-E / --encoding flag.特别指定其他文本编码。）

语法示例：

rg sometext file

无法grep输出python程序，可能是utf-16

2 个答案: