我需要在目录中找到二进制文件。我想用文件做这个,然后我会用grep检查结果。但我的问题是我不知道什么是二进制文件。什么会给二进制文件的文件命令或我应该用grep检查什么?
谢谢。
答案 0 :(得分:6)
Just have to mention Perl's -T
test for text files, and its opposite -B
for binary files.
$ find . -type f | perl -lne 'print if -B'
will print out any binary files it sees. Use -T
if you want the opposite: text files.
It's not totally foolproof as it only looks in the first 1,000 characters or so, but it's better than some of the ad-hoc methods suggested here. See man perlfunc for the whole rundown. Here is a summary:
The "-T" and "-B" switches work as follows. The first block or so of the file is examined to see if it is valid UTF-8 that includes non-ASCII characters. If, so it's a "-T" file. Otherwise, that same portion of the file is examined for odd characters such as strange control codes or characters with the high bit set. If more than a third of the characters are strange, it's a "-B" file; otherwise it's a "-T" file. Also, any file containing a zero byte in the examined portion is considered a binary file.
答案 1 :(得分:4)
这会找到所有基于非文本的文件,包括二进制文件和空文件。
只有grep
的解决方案(来自Mehrdad的评论):
grep -r -I -L .
这不需要除find
和grep
之外的任何其他工具:
find . -type f -exec grep -IL . "{}" \;
-I
告诉grep将二进制文件假设为不匹配的
-L
仅打印不匹配的文件
.
与其他任何内容匹配
这会找到所有非空的二进制文件:
find . -type f ! -size 0 -exec grep -IL . "{}" \;
答案 2 :(得分:2)
由于这是一项任务,如果我给你完整的解决方案,你可能会讨厌我;-)所以这里有一点暗示:
如果您搜索的grep
正则表达式将匹配任何非空文件,.
命令将默认输出二进制文件列表:
grep . *
输出:
[...]
Binary file c matches
Binary file e matches
您可以使用awk
仅获取文件名,并使用ls
来打印权限。请参阅相应的手册页(man grep
,man awk
,man ls
)。
答案 3 :(得分:2)
在现代时代(毕竟 2020 实际上是21世纪的第三个十年), 我认为正确的问题是如何找到所有非utf-8文件? utf-8是现代的文本文件。
具有非ASCII码点的文本的utf-8编码将引入非ASCII字节(即,设置了最高有效位的字节)。现在,并非所有此类字节序列都形成有效的utf-8序列。
您需要的是 moreutils 包中的isutf8 。
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
~\anaconda3\envs\tf1\lib\site-packages\pytesseract\pytesseract.py in run_tesseract(input_filename, output_filename_base, extension, lang, config, nice, timeout)
237 try:
--> 238 proc = subprocess.Popen(cmd_args, **subprocess_args())
239 except OSError as e:
~\anaconda3\envs\tf1\lib\subprocess.py in __init__(self, args, bufsize, executable, stdin, stdout, stderr, preexec_fn, close_fds, shell, cwd, env, universal_newlines, startupinfo, creationflags, restore_signals, start_new_session, pass_fds, encoding, errors, text)
799 errread, errwrite,
--> 800 restore_signals, start_new_session)
801 except:
~\anaconda3\envs\tf1\lib\subprocess.py in _execute_child(self, args, executable, preexec_fn, close_fds, pass_fds, cwd, env, startupinfo, creationflags, shell, p2cread, p2cwrite, c2pread, c2pwrite, errread, errwrite, unused_restore_signals, unused_start_new_session)
1206 os.fspath(cwd) if cwd is not None else None,
-> 1207 startupinfo)
1208 finally:
FileNotFoundError: [WinError 2] The system cannot find the file specified
During handling of the above exception, another exception occurred:
TesseractNotFoundError Traceback (most recent call last)
<ipython-input-24-518675d4cb18> in <module>
10
11 # Simple image to string
---> 12 print(pytesseract.image_to_string(Image.open('Train/TR_1.jpg')))
13
14 # # French text image to string
~\anaconda3\envs\tf1\lib\site-packages\pytesseract\pytesseract.py in image_to_string(image, lang, config, nice, output_type, timeout)
358 Output.DICT: lambda: {'text': run_and_get_output(*args)},
359 Output.STRING: lambda: run_and_get_output(*args),
--> 360 }[output_type]()
361
362
~\anaconda3\envs\tf1\lib\site-packages\pytesseract\pytesseract.py in <lambda>()
357 Output.BYTES: lambda: run_and_get_output(*(args + [True])),
358 Output.DICT: lambda: {'text': run_and_get_output(*args)},
--> 359 Output.STRING: lambda: run_and_get_output(*args),
360 }[output_type]()
361
~\anaconda3\envs\tf1\lib\site-packages\pytesseract\pytesseract.py in run_and_get_output(image, extension, lang, config, nice, timeout, return_bytes)
268 }
269
--> 270 run_tesseract(**kwargs)
271 filename = kwargs['output_filename_base'] + extsep + extension
272 with open(filename, 'rb') as output_file:
~\anaconda3\envs\tf1\lib\site-packages\pytesseract\pytesseract.py in run_tesseract(input_filename, output_filename_base, extension, lang, config, nice, timeout)
240 if e.errno != ENOENT:
241 raise e
--> 242 raise TesseractNotFoundError()
243
244 with timeout_manager(proc, timeout) as error_string:
TesseractNotFoundError: <full_path_to_your_tesseract_executable> is not installed or it's not in your PATH
快速检查:
$ isutf8 -l /bin/*
/bin/[
/bin/acyclic
/bin/addr2line
/bin/animate
/bin/applydeltarpm
/bin/apropos
⋮
您可能希望反转测试并获取所有文本文件。
使用$ file $(isutf8 -l /bin/*)
/bin/[: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=4d70c2142fc672d8a69d033ecb6693ec15b1e6fb, for GNU/Linux 3.2.0, stripped
/bin/acyclic: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=d428ea52eb0e8aaf7faf30914710d8fbabe6ca28, for GNU/Linux 3.2.0, stripped
/bin/addr2line: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=797f42bc4f8fb754a49b816b82d6b40804626567, for GNU/Linux 3.2.0, stripped
/bin/animate: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=36ab46e69c1bfea433382ffc9bbd9708365dac2b, for GNU/Linux 3.2.0, stripped
/bin/applydeltarpm: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=a1fddcbeec9266e698782596f2dfd1b4f3e0b974, for GNU/Linux 3.2.0, stripped
/bin/apropos: symbolic link to whatis
⋮
:
-i
是的,它可以读取整个文件,但是速度非常快,而且如果您需要准确性的话……
答案 4 :(得分:0)
我使用find
命令在此问题上的第一个答案几乎是内联的。我认为您的讲师正在使用magic numbers
命令让您了解file
的概念,该命令将他们分为多种类型。
就我而言,它很简单:
file * | grep executable
但这可以通过多种方式完成。
答案 5 :(得分:-1)
Linux中的二进制文件的格式为ELF
在二进制文件上运行file
命令时,输出将包含单词ELF
。您可以grep这个。
在命令行上:
file <binary_file_name>
因此,如果要在目录中查找二进制文件(例如在linux中),则可以执行以下操作:
ls | xargs file | grep ELF
答案 6 :(得分:-3)
您可以使用基本上所需的find
和参数-executable
。
该联机帮助页说:
-executable
Matches files which are executable and directories which are searchable (in a file name resolution sense). This takes into account access control lists and other permissions artefacts which the -perm test ignores. This test makes use of the access(2) system call, and so can be fooled by NFS servers which do UID mapping (or root-squashing), since many systems implement access(2) in the client's kernel and so cannot make use of the UID mapping information held on the server. Because this test is based only on the result of the access(2) system call, there is no guarantee that a file for which this test succeeds can actually be executed.
这是你想要的结果:
# find /bin -executable -type f | grep 'dmesg'
/bin/dmesg