如何在目录中找到二进制文件?

时间:2015-04-08 14:06:02

标签: linux file grep binary directory

我需要在目录中找到二进制文件。我想用文件做这个,然后我会用grep检查结果。但我的问题是我不知道什么是二进制文件。什么会给二进制文件的文件命令或我应该用grep检查什么?

谢谢。

7 个答案:

答案 0 :(得分:6)

Just have to mention Perl's -T test for text files, and its opposite -B for binary files.

$ find . -type f | perl -lne 'print if -B'

will print out any binary files it sees. Use -T if you want the opposite: text files.

It's not totally foolproof as it only looks in the first 1,000 characters or so, but it's better than some of the ad-hoc methods suggested here. See man perlfunc for the whole rundown. Here is a summary:

The "-T" and "-B" switches work as follows. The first block or so of the file is examined to see if it is valid UTF-8 that includes non-ASCII characters. If, so it's a "-T" file. Otherwise, that same portion of the file is examined for odd characters such as strange control codes or characters with the high bit set. If more than a third of the characters are strange, it's a "-B" file; otherwise it's a "-T" file. Also, any file containing a zero byte in the examined portion is considered a binary file.

答案 1 :(得分:4)

这会找到所有基于非文本的文件,包括二进制文件和空文件。

修改

只有grep的解决方案(来自Mehrdad的评论):

grep -r -I -L .

原始答案

这不需要除findgrep之外的任何其他工具:

find . -type f -exec grep -IL . "{}" \;

-I告诉grep将二进制文件假设为不匹配的

-L仅打印不匹配的文件

.与其他任何内容匹配

编辑2

这会找到所有非空的二进制文件:

find . -type f ! -size 0 -exec grep -IL . "{}" \;

答案 2 :(得分:2)

由于这是一项任务,如果我给你完整的解决方案,你可能会讨厌我;-)所以这里有一点暗示:

如果您搜索的grep正则表达式将匹配任何非空文件,.命令将默认输出二进制文件列表:

grep . *

输出:

[...]
Binary file c matches
Binary file e matches

您可以使用awk仅获取文件名,并使用ls来打印权限。请参阅相应的手册页(man grepman awkman ls)。

答案 3 :(得分:2)

在现代时代(毕竟 2020 实际上是21世纪的第三个十年), 我认为正确的问题是如何找到所有非utf-8文件? utf-8是现代的文本文件。

具有非ASCII码点的文本的

utf-8编码将引入非ASCII字节(即,设置了最高有效位的字节)。现在,并非所有此类字节序列都形成有效的utf-8序列。

您需要的是 moreutils 包中的

isutf8

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
~\anaconda3\envs\tf1\lib\site-packages\pytesseract\pytesseract.py in run_tesseract(input_filename, output_filename_base, extension, lang, config, nice, timeout)
    237     try:
--> 238         proc = subprocess.Popen(cmd_args, **subprocess_args())
    239     except OSError as e:

~\anaconda3\envs\tf1\lib\subprocess.py in __init__(self, args, bufsize, executable, stdin, stdout, stderr, preexec_fn, close_fds, shell, cwd, env, universal_newlines, startupinfo, creationflags, restore_signals, start_new_session, pass_fds, encoding, errors, text)
    799                                 errread, errwrite,
--> 800                                 restore_signals, start_new_session)
    801         except:

~\anaconda3\envs\tf1\lib\subprocess.py in _execute_child(self, args, executable, preexec_fn, close_fds, pass_fds, cwd, env, startupinfo, creationflags, shell, p2cread, p2cwrite, c2pread, c2pwrite, errread, errwrite, unused_restore_signals, unused_start_new_session)
   1206                                          os.fspath(cwd) if cwd is not None else None,
-> 1207                                          startupinfo)
   1208             finally:

FileNotFoundError: [WinError 2] The system cannot find the file specified

During handling of the above exception, another exception occurred:

TesseractNotFoundError                    Traceback (most recent call last)
<ipython-input-24-518675d4cb18> in <module>
     10 
     11 # Simple image to string
---> 12 print(pytesseract.image_to_string(Image.open('Train/TR_1.jpg')))
     13 
     14 # # French text image to string

~\anaconda3\envs\tf1\lib\site-packages\pytesseract\pytesseract.py in image_to_string(image, lang, config, nice, output_type, timeout)
    358         Output.DICT: lambda: {'text': run_and_get_output(*args)},
    359         Output.STRING: lambda: run_and_get_output(*args),
--> 360     }[output_type]()
    361 
    362 

~\anaconda3\envs\tf1\lib\site-packages\pytesseract\pytesseract.py in <lambda>()
    357         Output.BYTES: lambda: run_and_get_output(*(args + [True])),
    358         Output.DICT: lambda: {'text': run_and_get_output(*args)},
--> 359         Output.STRING: lambda: run_and_get_output(*args),
    360     }[output_type]()
    361 

~\anaconda3\envs\tf1\lib\site-packages\pytesseract\pytesseract.py in run_and_get_output(image, extension, lang, config, nice, timeout, return_bytes)
    268         }
    269 
--> 270         run_tesseract(**kwargs)
    271         filename = kwargs['output_filename_base'] + extsep + extension
    272         with open(filename, 'rb') as output_file:

~\anaconda3\envs\tf1\lib\site-packages\pytesseract\pytesseract.py in run_tesseract(input_filename, output_filename_base, extension, lang, config, nice, timeout)
    240         if e.errno != ENOENT:
    241             raise e
--> 242         raise TesseractNotFoundError()
    243 
    244     with timeout_manager(proc, timeout) as error_string:

TesseractNotFoundError: <full_path_to_your_tesseract_executable> is not installed or it's not in your PATH

快速检查:

$ isutf8 -l /bin/*
/bin/[
/bin/acyclic
/bin/addr2line
/bin/animate
/bin/applydeltarpm
/bin/apropos
⋮

您可能希望反转测试并获取所有文本文件。 使用$ file $(isutf8 -l /bin/*) /bin/[: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=4d70c2142fc672d8a69d033ecb6693ec15b1e6fb, for GNU/Linux 3.2.0, stripped /bin/acyclic: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=d428ea52eb0e8aaf7faf30914710d8fbabe6ca28, for GNU/Linux 3.2.0, stripped /bin/addr2line: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=797f42bc4f8fb754a49b816b82d6b40804626567, for GNU/Linux 3.2.0, stripped /bin/animate: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=36ab46e69c1bfea433382ffc9bbd9708365dac2b, for GNU/Linux 3.2.0, stripped /bin/applydeltarpm: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=a1fddcbeec9266e698782596f2dfd1b4f3e0b974, for GNU/Linux 3.2.0, stripped /bin/apropos: symbolic link to whatis ⋮

-i

是的,它可以读取整个文件,但是速度非常快,而且如果您需要准确性的话……

答案 4 :(得分:0)

我使用find命令在此问题上的第一个答案几乎是内联的。我认为您的讲师正在使用magic numbers命令让您了解file的概念,该命令将他们分为多种类型。

就我而言,它很简单:

file * | grep executable

但这可以通过多种方式完成。

答案 5 :(得分:-1)

Linux中的二进制文件的格式为ELF

在二进制文件上运行file命令时,输出将包含单词ELF。您可以grep这个。

在命令行上:

file <binary_file_name>

因此,如果要在目录中查找二进制文件(例如在linux中),则可以执行以下操作:

ls | xargs file | grep ELF

答案 6 :(得分:-3)

您可以使用基本上所需的find和参数-executable

该联机帮助页说:

   -executable
          Matches files which are executable and directories which are searchable (in a file name resolution sense).  This takes into  account  access control lists and other permissions artefacts which the -perm test ignores.  This test makes use of the access(2) system call, and so can be fooled by NFS servers which do UID mapping (or root-squashing), since many systems implement access(2) in the client's kernel and so  cannot make  use  of  the  UID mapping information held on the server.  Because this test is based only on the result of the access(2) system call, there is no guarantee that a file for which this test succeeds can actually be executed.

这是你想要的结果:

# find /bin  -executable -type f | grep 'dmesg'
/bin/dmesg