如何检测和删除管道文本的缩进

时间:2019-01-03 18:10:54

标签: bash unix pipe

我正在寻找一种删除管道文字缩进的方法。以下是使用cut -c 9-的解决方案,该解决方案假定缩进宽度为8个字符。

我正在寻找一种可以检测到要删除的空格数量的解决方案。这意味着要遍历整个(管道)文件,以了解用于缩进的最小空格(制表符?),然后在每一行上将其删除。

run.sh

help() {
    awk '
    /esac/{b=0}
    b
    /case "\$arg" in/{b=1}' \
    "$me" \
    | cut -c 9-
}

while [[ $# -ge 1 ]]
do
    arg="$1"
    shift
    case "$arg" in
        help|h|?|--help|-h|'-?')
            # Show this help
            help;;
    esac
done

$ ./run.sh --help

help|h|?|--help|-h|'-?')
    # Show this help
    help;;

注意:echo $' 4\n 2\n 3' | python3 -c 'import sys; import textwrap as tw; print(tw.dedent(sys.stdin.read()), end="")'可以工作,但是我希望有一种更好的方法(我的意思是,这不仅依赖于比python更常见的软件。也许awk吗?我不介意看到perl解决方案要么。

注意2:echo $' 4\n 2\n 3' | python -c 'import sys; import textwrap as tw; print tw.dedent(sys.stdin.read()),'也可以使用(Python 2.7.15rc1)。

5 个答案:

答案 0 :(得分:3)

以下是纯bash,没有外部工具或命令替代:

#!/usr/bin/env bash
all_lines=( )
min_spaces=9999 # start with something arbitrarily high
while IFS= read -r line; do
  all_lines+=( "$line" )
  if [[ ${line:0:$min_spaces} =~ ^[[:space:]]*$ ]]; then
    continue  # this line has at least as much whitespace as those preceding it
  fi
  # this line has *less* whitespace than those preceding it; we need to know how much.
  [[ $line =~ ^([[:space:]]*) ]]
  line_whitespace=${BASH_REMATCH[1]}
  min_spaces=${#line_whitespace}
done

for line in "${all_lines[@]}"; do
  printf '%s\n' "${line:$min_spaces}"
done

其输出是:

  4
2
 3

答案 1 :(得分:3)

假设您有:

$ echo $'    4\n  2\n   3\n\ttab'
    4
  2
   3
    tab

您可以使用Unix expand实用程序将制表符扩展到空格。然后遍历awk以计算一行上的最小空格数:

$ echo $'    4\n  2\n   3\n\ttab' | 
expand | 
awk 'BEGIN{min_indent=9999999}
     {lines[++cnt]=$0
      match($0, /^[ ]*/)
      if(RLENGTH<min_indent) min_indent=RLENGTH
     }
     END{for (i=1;i<=cnt;i++) 
               print substr(lines[i], min_indent+1)}'
  4
2
 3
      tab

答案 2 :(得分:1)

这是(半)明显的临时文件解决方案。

#!/bin/sh

t=$(mktemp -t dedent.XXXXXXXXXX) || exit
trap 'rm -f $t' EXIT ERR
awk '{ n = match($0, /[^ ]/); if (NR == 1 || n<min) min = n }1
    END { exit min+1 }' >"$t"
cut -c $?- "$t"

如果所有行都包含超过255个前导空格字符,则这样做显然会失败,因为结果将不适合Awk的退出代码。

这样做的好处是我们不会将自己限制在可用内存范围内。相反,我们将自己限制为可用的磁盘空间。缺点是磁盘可能较慢,但是恕我直言,不将大文件读入内存的优势。

答案 3 :(得分:0)

echo $'    4\n  2\n   3\n  \n   more spaces in  the    line\n  ...' | \
(text="$(cat)"; echo "$text" \
| cut -c "$(echo "$text" | sed 's/[^ ].*$//' | awk 'NR == 1 {a = length} length < a {a = length} END {print a + 1}')-"\
)

说明:

echo $'    4\n  2\n   3\n  \n   more spaces in  the    line\n  ...' | \
(
    text="$(cat)" # Obtain the input in a varibale
    echo "$text" | cut -c "$(
        # `cut` removes the n-1 first characters of each line of the input, where n is:
            echo "$text" | \
            sed 's/[^ ].*$//' | \
            awk 'NR == 1 || length < a {a = length} END {print a + 1}'
            # sed: keep only the initial spaces, remove the rest
            # awk:
            # At the first line `NR == 1`, get the length of the line `a = length`.
            # For any shorter line `a < length`, update the length `a = length`.
            # At the end of the piped input, print the shortest length + 1.
            # ... we add 1 because in `cut`, characters of the line are indexed at 1.
        )-"
)

更新:

可以避免产生sed。根据三元组的评论,sed的s///可以替换awk的sub()。这是一个更短的选项,使用n = match()作为三位一体用户的答案。

echo $'    4\n  2\n   3\n  \n   more spaces in  the    line\n  ...' | \
(
    text="$(cat)" # Obtain the input in a varibale
    echo "$text" | cut -c "$(
        # `cut` removes the a-1 first characters of each line of the input, where a is:
            echo "$text" | \
            awk '
                {n = match($0, /[^ ]/)}
                NR == 1 || n < a {a = n}
                END || a == 0 {print a + 1; exit 0}'
            # awk:
            # At every line, get the position of the first non-space character
            # At the first line `NR == 1`, copy that lenght to `a`.
            # For any line with less spaces than `a` (`n < a`) update `a`, (`a = n`).
            # At the end of the piped input, print a + 1.
            # a is then the minimum number of common leading spaces found in all lines.
            # ... we add 1 because in `cut`, characters of the line are indexed at 1.
            #
            # I'm not sure the whether the `a == 0 {...;  exit 0}` optimisation will let the "$text" be written to the script stdout yet (which is not desirable at all). Gotta test that when I get the time.

        )-"
)

显然,在Perl 6中,也可以使用功能my &f = *.indent(*);

答案 4 :(得分:0)

另一个基于dawg’s answerawk解决方案。主要区别包括:

  • 无需为缩进设置任意大的数字,这感觉很hacky。
  • 处理空行文本,在收集最低的缩进行时不考虑它们。
awk '
  {
    lines[++count] = $0
    if (NF == 0) next
    match($0, /[^ ]/)
    if (length(min) == 0 || RSTART < min) min = RSTART
  }
  END {
    for (i = 1; i <= count; i++) print substr(lines[i], min)
  }
' <<< $'    4\n  2\n   3'

或全部在同一行

awk '{ lines[++count] = $0; if (NF == 0) next; match($0, /[^ ]/); if (length(min) == 0 || RSTART < min) min = RSTART; } END { for (i = 1; i <= count; i++) print substr(lines[i], min) }' <<< $'    4\n  2\n   3'

说明:

将当前行添加到数组,并递增count变量

{
  lines[++count] = $0

如果行为空,请跳至下一个迭代

  if (NF == 0) next

RSTART设置为第一个非空格字符的起始索引。

  match($0, /[^ ]/)

如果未设置min或高于RSTART,则将前者设置为后者。

  if (length(min) == 0 || RSTART < min) min = RSTART
}

读取所有输入后运行。

END {

遍历数组,并为每行仅打印一个从min中设置的索引到行末的子字符串。

  for (i = 1; i <= count; i++) print substr(lines[i], min)
}