Question

是否有HDFS命令检查HDFS中的2个目录是否具有公共父目录。

例如：

$ hadoop fs -ls -R  /user/username/data/
/user/username/data/LIST_1539724717/SUBLIST_1533057294, 
/user/username/data/LIST_1539724717/SUBLIST_1533873826/UI,
/user/username/data/LIST_1539724717/SUBLIST_1533873826/NEWDATA/A,
/user/username/data/LIST_1539724717/SUBLIST_1533873826/NEWDATA/A/N,
/user/username/data/LIST_1539724717/SUBLIST_1533873826/NEWDATA/M/K/L,
/user/username/data/LIST_1539724717/SUBLIST_1533873826/NEWDATA/O/P/P,
/user/username/data/LIST_1539724717/SUBLIST_1533873826/NEWDATA/wkejdhew,
/user/username/data/LIST_1539724717/SUBLIST_1533873826/NEWDATA/oi32u,
/user/username/data/ARRAY_1539724717/SUBLIST_1533057294, 
/user/username/data/ARRAY_1539724717/SUBLIST_1533873826/UI,
/user/username/data/ARRAY_1539724717/SUBLIST_1533873826/NEWDATA/A,
/user/username/data/ARRAY_1539724717/SUBLIST_1533873826/NEWDATA/A/N,
/user/username/data/ARRAY_1539724717/SUBLIST_1533873826/NEWDATA/M/K/L,
/user/username/data/ARRAY_1539724717/SUBLIST_1533873826/NEWDATA/O/P/P,
/user/username/data/ARRAY_1539724717/SUBLIST_1533873826/NEWDATA/wkejdhew,
/user/username/data/ARRAY_1539724717/SUBLIST_1533873826/NEWDATA/oi32u,

所有这些目录共享相同的父目录/user/username/data/LIST_1539724717/SUBLIST_1533057294和/user/username/data/ARRAY_1539724717/SUBLIST_1533057294。我们如何在bash中检查呢？

Answer 1

通过创建目录名称可以作为变量传递的shell脚本，我们可以检查它们是否属于同一父级。

Answer 2

for value in `hadoop fs -ls ${DIR}| awk '{print $NF}' | tr '\n' ' '`
do
    if [ "$value" != "items" ]; then
        #add values into "results" array
        log "info" "$value"
        results+=("$value")
    fi
done

#Loop through each value inside the array ie " $DIR"
for i in "${results[@]}"
do
    oldVal=`hadoop fs -ls -R ${i} | sed 's/  */ /g' | cut -d\  -f 1,8 --output-delimiter=',' | grep ^d | cut -d, -f2`
    log "info" "Checking sub-directories under $i ! "
    #This takes the directory name as its input and extract the directories only for the provided runID
        for val in `hadoop fs -ls -R $i  | grep  1539724717 |sed 's/  */ /g' | cut -d\  -f 1,8 --output-delimiter=',' | grep ^d | cut -d, -f2`
          do

           if [[ ! ${val} =~ ${oldVal} ]]; then
               oldVal=$val
               directory+=("${oldVal}")
           fi
        done
done

directory数组包含所有需要的目录。

在HDFS中：如何检查2个目录是否具有相同的父目录

2 个答案: