我们可以在Linux中使用hadoop fsck命令时跳过文件检查吗?

时间:2018-07-27 09:52:13

标签: unix hadoop hdfs

我想使用hadoop fsck命令跳过指定路径上的文件检查。我们可以做到吗? 我正在使用以下命令:

hadoop fsck> /output.txt

我也检查了hdfs指南,但是没有什么可以排除上述命令中的路径。

请帮忙。

1 个答案:

答案 0 :(得分:1)

从Hadoop2.9.0开始,无法在hadoop fsck命令中指定排除路径。

但是您可以使用WebHDFS REST API来获得与fsck相同的文件系统运行状况信息。使用此API,我们可以使用LISTSTATUS api获取目录内所有文件的信息,或者使用GETFILESTATUS api获取单个文件的信息。

对于目录:

curl -i  "http://<HOST>:<PORT>/webhdfs/v1/<DIRECTORY_PATH>?op=LISTSTATUS"

对于文件:

curl -i  "http://<HOST>:<PORT>/webhdfs/v1/<FILE_PATH>?op=GETFILESTATUS"

这些将返回带有FileStatuses JSON对象的响应。

请在下面找到从NN返回的目录响应示例:

curl -i "http://<NN_HOST>:<HTTP_PORT>/webhdfs/v1/<DIRECTORY_PATH>?op=LISTSTATUS"
HTTP/1.1 200 OK
Cache-Control: no-cache
Content-Type: application/json
Transfer-Encoding: chunked
Server: Jetty(6.1.26.hwx)

{"FileStatuses":{"FileStatus":[
{"accessTime":1489059994224,"blockSize":134217728,"childrenNum":0,"fileId":209158298,"group":"hdfs","length":0,"modificationTime":1489059994227,"owner":"XXX","pathSuffix":"_SUCCESS","permission":"644","replication":3,"storagePolicy":0,"type":"FILE"},
{"accessTime":1489059969939,"blockSize":134217728,"childrenNum":0,"fileId":209158053,"group":"hdfs","length":0,"modificationTime":1489059986846,"owner":"XXX","pathSuffix":"part-m-00000","permission":"644","replication":3,"storagePolicy":0,"type":"FILE"},
{"accessTime":1489059982614,"blockSize":134217728,"childrenNum":0,"fileId":209158225,"group":"hdfs","length":0,"modificationTime":1489059993497,"owner":"XXX","pathSuffix":"part-m-00001","permission":"644","replication":3,"storagePolicy":0,"type":"FILE"},
{"accessTime":1489059977524,"blockSize":134217728,"childrenNum":0,"fileId":209158188,"group":"hdfs","length":0,"modificationTime":1489059983034,"owner":"XXX","pathSuffix":"part-m-00002","permission":"644","replication":3,"storagePolicy":0,"type":"FILE"}]}}