是否有HDFS命令检查HDFS中的2个目录是否具有公共父目录。
例如:
$ hadoop fs -ls -R /user/username/data/
/user/username/data/LIST_1539724717/SUBLIST_1533057294,
/user/username/data/LIST_1539724717/SUBLIST_1533873826/UI,
/user/username/data/LIST_1539724717/SUBLIST_1533873826/NEWDATA/A,
/user/username/data/LIST_1539724717/SUBLIST_1533873826/NEWDATA/A/N,
/user/username/data/LIST_1539724717/SUBLIST_1533873826/NEWDATA/M/K/L,
/user/username/data/LIST_1539724717/SUBLIST_1533873826/NEWDATA/O/P/P,
/user/username/data/LIST_1539724717/SUBLIST_1533873826/NEWDATA/wkejdhew,
/user/username/data/LIST_1539724717/SUBLIST_1533873826/NEWDATA/oi32u,
/user/username/data/ARRAY_1539724717/SUBLIST_1533057294,
/user/username/data/ARRAY_1539724717/SUBLIST_1533873826/UI,
/user/username/data/ARRAY_1539724717/SUBLIST_1533873826/NEWDATA/A,
/user/username/data/ARRAY_1539724717/SUBLIST_1533873826/NEWDATA/A/N,
/user/username/data/ARRAY_1539724717/SUBLIST_1533873826/NEWDATA/M/K/L,
/user/username/data/ARRAY_1539724717/SUBLIST_1533873826/NEWDATA/O/P/P,
/user/username/data/ARRAY_1539724717/SUBLIST_1533873826/NEWDATA/wkejdhew,
/user/username/data/ARRAY_1539724717/SUBLIST_1533873826/NEWDATA/oi32u,
所有这些目录共享相同的父目录/user/username/data/LIST_1539724717/SUBLIST_1533057294
和/user/username/data/ARRAY_1539724717/SUBLIST_1533057294
。我们如何在bash中检查呢?
答案 0 :(得分:1)
通过创建目录名称可以作为变量传递的shell脚本,我们可以检查它们是否属于同一父级。
答案 1 :(得分:0)
for value in `hadoop fs -ls ${DIR}| awk '{print $NF}' | tr '\n' ' '`
do
if [ "$value" != "items" ]; then
#add values into "results" array
log "info" "$value"
results+=("$value")
fi
done
#Loop through each value inside the array ie " $DIR"
for i in "${results[@]}"
do
oldVal=`hadoop fs -ls -R ${i} | sed 's/ */ /g' | cut -d\ -f 1,8 --output-delimiter=',' | grep ^d | cut -d, -f2`
log "info" "Checking sub-directories under $i ! "
#This takes the directory name as its input and extract the directories only for the provided runID
for val in `hadoop fs -ls -R $i | grep 1539724717 |sed 's/ */ /g' | cut -d\ -f 1,8 --output-delimiter=',' | grep ^d | cut -d, -f2`
do
if [[ ! ${val} =~ ${oldVal} ]]; then
oldVal=$val
directory+=("${oldVal}")
fi
done
done
directory
数组包含所有需要的目录。