我想使用hadoop fsck命令跳过指定路径上的文件检查。我们可以做到吗? 我正在使用以下命令:
hadoop fsck> /output.txt
我也检查了hdfs指南,但是没有什么可以排除上述命令中的路径。
请帮忙。
答案 0 :(得分:1)
从Hadoop2.9.0开始,无法在hadoop fsck命令中指定排除路径。
但是您可以使用WebHDFS REST API来获得与fsck相同的文件系统运行状况信息。使用此API,我们可以使用LISTSTATUS api获取目录内所有文件的信息,或者使用GETFILESTATUS api获取单个文件的信息。
对于目录:
curl -i "http://<HOST>:<PORT>/webhdfs/v1/<DIRECTORY_PATH>?op=LISTSTATUS"
对于文件:
curl -i "http://<HOST>:<PORT>/webhdfs/v1/<FILE_PATH>?op=GETFILESTATUS"
这些将返回带有FileStatuses JSON对象的响应。
请在下面找到从NN返回的目录响应示例:
curl -i "http://<NN_HOST>:<HTTP_PORT>/webhdfs/v1/<DIRECTORY_PATH>?op=LISTSTATUS"
HTTP/1.1 200 OK
Cache-Control: no-cache
Content-Type: application/json
Transfer-Encoding: chunked
Server: Jetty(6.1.26.hwx)
{"FileStatuses":{"FileStatus":[
{"accessTime":1489059994224,"blockSize":134217728,"childrenNum":0,"fileId":209158298,"group":"hdfs","length":0,"modificationTime":1489059994227,"owner":"XXX","pathSuffix":"_SUCCESS","permission":"644","replication":3,"storagePolicy":0,"type":"FILE"},
{"accessTime":1489059969939,"blockSize":134217728,"childrenNum":0,"fileId":209158053,"group":"hdfs","length":0,"modificationTime":1489059986846,"owner":"XXX","pathSuffix":"part-m-00000","permission":"644","replication":3,"storagePolicy":0,"type":"FILE"},
{"accessTime":1489059982614,"blockSize":134217728,"childrenNum":0,"fileId":209158225,"group":"hdfs","length":0,"modificationTime":1489059993497,"owner":"XXX","pathSuffix":"part-m-00001","permission":"644","replication":3,"storagePolicy":0,"type":"FILE"},
{"accessTime":1489059977524,"blockSize":134217728,"childrenNum":0,"fileId":209158188,"group":"hdfs","length":0,"modificationTime":1489059983034,"owner":"XXX","pathSuffix":"part-m-00002","permission":"644","replication":3,"storagePolicy":0,"type":"FILE"}]}}