我正在尝试在bash中读取多行制表符分隔文件。格式是预期的空字段。不幸的是,shell正在将彼此相邻的字段分隔符折叠在一起,如下所示:
# IFS=$'\t'
# read one two three <<<$'one\t\tthree'
# printf '<%s> ' "$one" "$two" "$three"; printf '\n'
<one> <three> <>
...而不是<one> <> <three>
的期望输出。
这可以在不诉诸单独的语言(例如awk)的情况下解决吗?
答案 0 :(得分:11)
IFS=,
echo $'one\t\tthree' | tr \\11 , | (
read one two three
printf '<%s> ' "$one" "$two" "$three"; printf '\n'
)
我稍微重新安排了一些示例,但只是为了让它在任何Posix shell中都能正常工作。
更新:是的,似乎空白是特殊的,至少在IFS中是这样。从bash(1)看本段的后半部分:
The shell treats each character of IFS as a delimiter, and splits the
results of the other expansions into words on these characters. If IFS
is unset, or its value is exactly <space><tab><newline>, the default,
then any sequence of IFS characters serves to delimit words. If IFS
has a value other than the default, then sequences of the whitespace
characters space and tab are ignored at the beginning and end of the
word, as long as the whitespace character is in the value of IFS (an
IFS whitespace character). Any character in IFS that is not IFS white-
space, along with any adjacent IFS whitespace characters, delimits a
field. A sequence of IFS whitespace characters is also treated as a
delimiter. If the value of IFS is null, no word splitting occurs.
答案 1 :(得分:4)
没有必要使用tr
,但IFS
必须是非空格字符(否则倍数会折叠成单身,如您所见)。
$ IFS=, read -r one two three <<<'one,,three'
$ printf '<%s> ' "$one" "$two" "$three"; printf '\n'
<one> <> <three>
$ var=$'one\t\tthree'
$ var=${var//$'\t'/,}
$ IFS=, read -r one two three <<< "$var"
$ printf '<%s> ' "$one" "$two" "$three"; printf '\n'
<one> <> <three>
$ idel=$'\t' odel=','
$ var=$'one\t\tthree'
$ var=${var//$idel/$odel}
$ IFS=$odel read -r one two three <<< "$var"
$ printf '<%s> ' "$one" "$two" "$three"; printf '\n'
<one> <> <three>
答案 2 :(得分:3)
我写了一个解决这个问题的函数。这个特定的实现特别是关于制表符分隔的列和换行符分隔的行,但这个限制可以作为一个简单的练习删除:
read_tdf_line() {
local default_ifs=$' \t\n'
local n line element at_end old_ifs
old_ifs="${IFS:-${default_ifs}}"
IFS=$'\n'
if ! read -r line ; then
return 1
fi
at_end=0
while read -r element; do
if (( $# > 1 )); then
printf -v "$1" '%s' "$element"
shift
else
if (( at_end )) ; then
# replicate read behavior of assigning all excess content
# to the last variable given on the command line
printf -v "$1" '%s\t%s' "${!1}" "$element"
else
printf -v "$1" '%s' "$element"
at_end=1
fi
fi
done < <(tr '\t' '\n' <<<"$line")
# if other arguments exist on the end of the line after all
# input has been eaten, they need to be blanked
if ! (( at_end )) ; then
while (( $# )) ; do
printf -v "$1" '%s' ''
shift
done
fi
# reset IFS to its original value (or the default, if it was
# formerly unset)
IFS="$old_ifs"
}
用法如下:
# read_tdf_line one two three rest <<<$'one\t\tthree\tfour\tfive'
# printf '<%s> ' "$one" "$two" "$three" "$rest"; printf '\n'
<one> <> <three> <four five>
答案 3 :(得分:3)
这是一种带有一些细节的方法:
代码。 file_data
和file_input
仅用于生成输入,就像从脚本调用的外部命令一样。可以为data
和cols
调用等参数化get
和put
,但此脚本不会那么远。
#!/bin/bash
file_data=( $'\t\t' $'\t\tbC' $'\tcB\t' $'\tdB\tdC' \
$'eA\t\t' $'fA\t\tfC' $'gA\tgB\t' $'hA\thB\thC' )
file_input () { printf '%s\n' "${file_data[@]}" ; } # simulated input file
delim=$'\t'
# the IFS=$'\n' has a side-effect of skipping blank lines; acceptable:
OIFS="$IFS" ; IFS=$'\n' ; oset="$-" ; set -f
lines=($(file_input)) # read the "file"
set -"$oset" ; IFS="$OIFS" ; unset oset # cleanup the environment mods.
# the read-in data has (rows * cols) fields, with cols as the stride:
data=()
cols=0
get () { local r=$1 c=$2 i ; (( i = cols * r + c )) ; echo "${data[$i]}" ; }
put () { local r=$1 c=$2 i ; (( i = cols * r + c )) ; data[$i]="$3" ; }
# convert the lines from input into the pseudo-2D data array:
i=0 ; row=0 ; col=0
for line in "${lines[@]}" ; do
line="$line$delim"
while [ -n "$line" ] ; do
case "$line" in
*${delim}*) data[$i]="${line%%${delim}*}" ; line="${line#*${delim}}" ;;
*) data[$i]="${line}" ; line= ;;
esac
(( ++i ))
done
[ 0 = "$cols" ] && (( cols = i ))
done
rows=${#lines[@]}
# output the data array as a matrix, using the get accessor
for (( row=0 ; row < rows ; ++row )) ; do
printf 'row %2d: ' $row
for (( col=0 ; col < cols ; ++col )) ; do
printf '%5s ' "$(get $row $col)"
done
printf '\n'
done
输出:
$ ./tabtest
row 0:
row 1: bC
row 2: cB
row 3: dB dC
row 4: eA
row 5: fA fC
row 6: gA gB
row 7: hA hB hC
答案 4 :(得分:3)
这是我使用的快速而简单的功能,可以避免调用外部程序或限制输入字符的范围。它仅适用于bash(我猜)。
如果要允许比字段更多的变量,则需要根据Charles Duffy的答案进行修改。
# Substitute for `read -r' that doesn't merge adjacent delimiters.
myread() {
local input
IFS= read -r input || return $?
while [[ "$#" -gt 1 ]]; do
IFS= read -r "$1" <<< "${input%%[$IFS]*}"
input="${input#*[$IFS]}"
shift
done
IFS= read -r "$1" <<< "$input"
}
答案 5 :(得分:0)
为了防止空字段的崩溃,您可以使用 除 IFS“空白”字符之外的任何分隔符。
不同分隔符的行为示例:
#!/bin/bash
for delimiter in $'\t' ',' '|' $'\377' $'\x1f' ;do
line="one${delimiter}${delimiter}three"
IFS=$delimiter read one two three <<<"$line"
printf '<%s> ' "$one" "$two" "$three"; printf '\n'
done
<one> <three> <>
<one> <> <three>
<one> <> <three>
<one> <> <three>
<one> <> <three>
或者使用 OP 的原始示例:
IFS='|' read one two three <<<$(tr '\t' '|' <<<$'one\t\tthree')
printf '<%s> ' "$one" "$two" "$three"; printf '\n'
<one> <> <three>