在HDFS中:如何检查2个目录是否具有相同的父目录

时间:2018-10-18 16:00:06

标签: bash shell hadoop hdfs

是否有HDFS命令检查HDFS中的2个目录是否具有公共父目录。

例如:

$ hadoop fs -ls -R  /user/username/data/
/user/username/data/LIST_1539724717/SUBLIST_1533057294, 
/user/username/data/LIST_1539724717/SUBLIST_1533873826/UI,
/user/username/data/LIST_1539724717/SUBLIST_1533873826/NEWDATA/A,
/user/username/data/LIST_1539724717/SUBLIST_1533873826/NEWDATA/A/N,
/user/username/data/LIST_1539724717/SUBLIST_1533873826/NEWDATA/M/K/L,
/user/username/data/LIST_1539724717/SUBLIST_1533873826/NEWDATA/O/P/P,
/user/username/data/LIST_1539724717/SUBLIST_1533873826/NEWDATA/wkejdhew,
/user/username/data/LIST_1539724717/SUBLIST_1533873826/NEWDATA/oi32u,
/user/username/data/ARRAY_1539724717/SUBLIST_1533057294, 
/user/username/data/ARRAY_1539724717/SUBLIST_1533873826/UI,
/user/username/data/ARRAY_1539724717/SUBLIST_1533873826/NEWDATA/A,
/user/username/data/ARRAY_1539724717/SUBLIST_1533873826/NEWDATA/A/N,
/user/username/data/ARRAY_1539724717/SUBLIST_1533873826/NEWDATA/M/K/L,
/user/username/data/ARRAY_1539724717/SUBLIST_1533873826/NEWDATA/O/P/P,
/user/username/data/ARRAY_1539724717/SUBLIST_1533873826/NEWDATA/wkejdhew,
/user/username/data/ARRAY_1539724717/SUBLIST_1533873826/NEWDATA/oi32u,

所有这些目录共享相同的父目录/user/username/data/LIST_1539724717/SUBLIST_1533057294/user/username/data/ARRAY_1539724717/SUBLIST_1533057294。我们如何在bash中检查呢?

2 个答案:

答案 0 :(得分:1)

通过创建目录名称可以作为变量传递的shell脚本,我们可以检查它们是否属于同一父级。

答案 1 :(得分:0)

for value in `hadoop fs -ls ${DIR}| awk '{print $NF}' | tr '\n' ' '`
do
    if [ "$value" != "items" ]; then
        #add values into "results" array
        log "info" "$value"
        results+=("$value")
    fi
done

#Loop through each value inside the array ie " $DIR"
for i in "${results[@]}"
do
    oldVal=`hadoop fs -ls -R ${i} | sed 's/  */ /g' | cut -d\  -f 1,8 --output-delimiter=',' | grep ^d | cut -d, -f2`
    log "info" "Checking sub-directories under $i ! "
    #This takes the directory name as its input and extract the directories only for the provided runID
        for val in `hadoop fs -ls -R $i  | grep  1539724717 |sed 's/  */ /g' | cut -d\  -f 1,8 --output-delimiter=',' | grep ^d | cut -d, -f2`
          do

           if [[ ! ${val} =~ ${oldVal} ]]; then
               oldVal=$val
               directory+=("${oldVal}")
           fi
        done
done

directory数组包含所有需要的目录。