Question

我有一个包含这些行的文件：

street "City Name" 5 7500    30.3.2016
"Street Name"    city 4 1000   15.01.2015
<street name> <city name> <num of room> <price> <date>

我需要检查文件并按一些列进行排序 - 比如名称价格日期等。

我坚持使用行中间的空白区域（每个参数之间可以有多个空格）和字符串之间（可以是1个单词或2个或更多）和单词的开头（我不能使用sed）。

任何人都可以为我提供一个丢失多个空格的解决方案，所以我会留下如下行：

street "City Name" 5 7500 3.30.2016
"Street Name" city 4 1000 01.15.2015

Answer 1

以下内容会将您的文件转换为制表符分隔的形式，其中sort或其他标准工具可以轻松处理它：

while read -r line; do
  printf '%s\n' "$line" | xargs printf '%s\t'
  echo
done

这是有效的，因为xargs解析引号和空格，将每一行分成单独的元素，然后将每个元素传递给printf '%s\t'，这些元素在它们之间打印标签; echo然后在输出行之间添加换行符。

然后输出可以输入如下内容：

sort -t $'\t' -k2,2 -k1,1

...将对制表符分隔的列进行排序，首先在第二个键（在您的示例中为city），然后在第一个键（在您的示例中为街道名称）上进行排序。

让我们使用下面的输入文件，这将使行为比原始提案的情况更清晰：

"Street A" "City A" 1
"Street B" "City B" 2
"A Street" "City A" 3
"B Street" "City B" 4
"Street A" "A City" 5
"Street B" "B City" 6
Street City 7

使用LANG=C sort -s -t$'\t' -k2,2 -k1,1 | expand -t16运行上面的内容， - 首先按城市排序，然后按街道排序，然后使用16空格的tabstops打印 - 输出如下：

Street A        A City          5
Street B        B City          6
Street          City            7
A Street        City A          3
Street A        City A          1
B Street        City B          4
Street B        City B          2

相比之下，使用LANG=C sort -s -t$'\t' -k1,1 -k2,2 | expand -t16首先按街道排序，然后按城市排序（并使用16空格标签打印），您将获得以下内容：

A Street        City A          3
B Street        City B          4
Street          City            7
Street A        A City          5
Street A        City A          1
Street B        B City          6
Street B        City B          2

如果您想从制表符分隔格式返回到引用格式，这也是可行的：

#!/bin/bash
#      ^^^^- Important, not /bin/sh

while IFS=$'\t' read -r -a cols; do
  for col in "${cols[@]}"; do
    if [[ $col = *[[:space:]]* ]]; then
      printf '"%s" ' "$col"
    else
      printf '%s ' "$col"
    fi
  done
  printf '\n'
done

获取原始输入并通过第一个脚本运行（转换为制表符分隔的表单），然后sort -t$'\t' -k1,1 -k2,2（以该表单排序），然后是第二个脚本（转换回空格分隔符引用），得出以下结论：

"A Street" "City A" 3
"B Street" "City B" 4
Street City 7
"Street A" "A City" 5
"Street A" "City A" 1
"Street B" "B City" 6
"Street B" "City B" 2

Answer 2

您可以将tr与-s标记（表示挤压）

一起使用

echo "  a sentence       with lots of    spaces" | tr -s " "

如果你想删除初始空格，只需通过cut

echo "  a sentence       with lots of    spaces" | tr -s " " | cut -d ' ' -f2-

编辑：正如Charles Duffy建议您可以使用sed代替，以便在没有领先空间的情况下保护您

echo "  a sentence       with lots of    spaces" | tr -s " " | sed -re 's/^ +//'

Answer 3

尝试一下：

awk -F \" -v OFS=\" '{for (i=1; i<=NF; i=i+2) while (sub(/  /," ",$i)) ; print}' afile1

目标是保持2 "中包含的字符串不变，并将2 "之外的多个空格替换为单个空格。

当使用-v OFS=\"时，

"将print定义为输出的字段分隔符。

-F \"将"定义为输入行读取的字段分隔符。每个行都根据"在存储在$1 $2等变量中的多个元素中进行拆分。

因此，奇数字段（$1，$3等）超出2 "，对吗？

NF是拆分后当前行中找到的元素数。

for语句仅循环遍历奇数字段。 gsub用一个空格替换奇数字段中的所有多个空格。

测试：

$ awk -F \" -v OFS=\" '{for (i=1; i<=NF; i=i+2) gsub(/  */," ",$i) ; print}' afile
street "City Name" 5 7500 30.3.2016
"Street Name" city 4 1000 15.01.2015
<street name> <city name> <num of room> <price> <date>

在Bash中，如何使用字符串和多个空格来处理文件？

3 个答案: