Question

我的字符串包含一些目录结构。

dirs='Rootdir/ 
    Secondrootdir/
    Rootdir/Subdir/
    Secondrootdir/Anothersubdir/
    Secondrootdir/Thirdsubdir/
    Secondrootdir/Anothersubdir/Subsubdir/'

我想过滤它并获得以下内容：

dirs='Rootdir/Subdir/ Secondrootdir/Thirdsubdir/ 
      Secondrootdir/Anothersubdir/Subsubdir/'

请帮帮我。

Answer 1

也许是这样的：

dirs="Rootdir/ Secondrootdir/ Rootdir/Subdir/ Secondrootdir/Anothersubdir/ Secondrootdir/Thirdsubdir/ Secondrootdir/Anothersubdir/Subsubdir/"

echo $dirs \ 
    | tr ' ' '\n' \
    | sed -e 's#\([^/]\)$#\1/#' \
    | sort -r \
    | gawk '!index(prev,$0){print;} {prev=$0;}'

这会产生

Secondrootdir/Thirdsubdir/
Secondrootdir/Anothersubdir/Subsubdir/
Rootdir/Subdir/

这里，tr首先将空格分隔的输入分成单独的行。 sed确保每条路径都以斜杠结尾。与sort -r结合使用，结果是如果路径p是路径q的子路径，那么q首先出现在排序输出中。最后，gawk仅过滤那些不是前一个路径的路径。由于特定的排序顺序，这有效地仅选择目录结构的叶子......

Answer 2

除了优秀的@ewcz之外，我提出了一个替代方案，以及不会调用外部可执行文件并尊重原始问题中提出的格式的显式版本：

dirs='Rootdir/ 
    Secondrootdir/
    Rootdir/Subdir/
    Secondrootdir/Anothersubdir/
    Secondrootdir/Thirdsubdir/
    Secondrootdir/Anothersubdir/Subsubdir/'
out=()
for d in ${dirs};do
  found=0
  for db in ${dirs};do
      # d is subpath of db
      [[ ( "${db}" == "${d}"* ) && (${#db} -gt ${#d})  ]] && found=1 && break
  done
  [[ $found == 0 ]] && out+=($d)
done

echo ${out[*]}

Answer 3

首先显示要删除的行您要删除每个路径，您将拥有相同的路径，后跟一些文件夹。在找到path/时删除以path/more/结尾的字符串我使用"${dirs// }"修复以空格结尾的第一行。对于带有空格的目录，解决方案将失败，但输入格式也缺少引号。

sed -n '/\/.*\// s# *\(.*/\)\([^/]*\)/$#\1#p' <<< "${dirs// }"  | sort -u

现在您可以使用进程替换告诉grep跳过上述命令给出的“文件”中的所有行。
您需要使用不同的grep选项：F将忽略特殊含义，x仅匹配完整行，v将反转grep和f将从文件中读取要匹配的字符串。

grep -Fxvf <(
   sed -n '/\/.*\// s# *\(.*/\)\([^/]*\)/$#\1#p' <<< "${dirs// }"  | sort -u
   ) <<< "${dirs// }"

如何从路径中删除最短的子路径？

3 个答案: