Question

我有以这种格式登录：

log1,john,time,etc
log2,peter,time,etc
log3,jack,time,etc
log4,peter,time,etc

我想以格式

为每个人创建一个列表

"name"=("no.lines" "line" "line" ...)

例如：

peter=("2" "log2,peter,time,etc" "log4,peter,time,etc")

我已经有了这个结构并且知道如何创建像

这样的变量

declare "${FIELD[1]}"=1

但我不知道如何增加记录数量，如果我想创建一个这样的列表并附加到其中，我会收到错误。

#!/bin/bash

F=("log1,john,time,etc" "log2,peter,time,etc" "log3,jack,time,etc" "log4,peter,time,etc")
echo "${F[@]}"

declare -a CLIENTS
for LINE in "${F[@]}"
do
    echo "$LINE"
    IFS=',' read -ra  FIELD < <(echo "$LINE")

    if [ -z "${!FIELD[1]}" ] && [ -n "${FIELD[1]}" ] # check if there is already record for given line, if not create
    then 
            CLIENTS=("${CLIENTS[@]}" "${FIELD[1]}") # add person to list of variables records for later access
            declare -a "${FIELD[1]}"=("1" "LINE") # ERROR

    elif [ -n "${!FIELD[1]}" ] && [ -n "${FIELD[1]}" ] # if already record for client
    then 
            echo "Increase records number" # ???
            echo "Append record"
            "${FIELD[@]}"=("${FIELD[@]}" "$LINE") # ERROR

    else    
            echo "ELSE"
    fi

done

echo -e "CLIENTS: \n ${CLIENTS[@]}"
echo "Client ${CLIENTS[0]} has ${!CLIENTS[0]} records"
echo "Client ${CLIENTS[1]} has ${!CLIENTS[1]} records"
echo "Client ${CLIENTS[2]} has ${!CLIENTS[2]} records"
echo "Client ${CLIENTS[3]} has ${!CLIENTS[3]} records"

Answer 1

警告：以下使用namevars，一个新的bash 4.3功能。

首先：我强烈建议使用前缀命名数组，以避免与不相关的变量发生冲突。因此，使用content_作为前缀：

read_arrays() {
  while IFS= read -r line && IFS=, read -r -a fields <<<"$line"; do
    name=${fields[1]}
    declare -g -a "content_${fields[1]}"
    declare -n cur_array="content_${fields[1]}"
    cur_array+=( "$line" )
    unset -n cur_array
  done
}

然后：

lines_for() {
  declare -n cur_array="content_$1"
  printf '%s\n' "${#cur_array[@]}" ## emit length of array for given person
}

...或...

for_each_line() {
  declare -n cur_array="content_$1"; shift
  for line in "${cur_array[@]}"; do
    "$@" "$line"
  done
}

将所有这些结合在一起：

$ read_arrays <<'EOF'
log1,john,time,etc
log2,peter,time,etc
log3,jack,time,etc
log4,peter,time,etc
EOF
$ lines_for peter
2
$ for_each_line peter echo
log2,peter,time,etc
log4,peter,time,etc

...而且，如果你真的想要你要求的格式，列数作为显式数据，而且变量名称没有安全地命名空间，那么很容易从一个对方：

# this should probably be run in a subshell to avoid namespace pollution
# thus, (generate_stupid_format) >output
generate_stupid_format() {
  for scoped_varname in "${!content_@}"; do
    unscoped_varname="${scoped_varname#content_}"
    declare -n unscoped_var=$unscoped_varname
    declare -n scoped_var=$scoped_varname
    unscoped_var=( "${#scoped_var[@]}" "${scoped_var[@]}" )
    declare -p "$unscoped_varname"
  done
}

Answer 2

使用Coreutils，grep和sed进行Bash

如果我理解你的代码正确，你会尝试使用Bash不支持的多维数组。如果我要从头开始解决这个问题，我会使用这些命令行工具（请参阅答案末尾的安全问题！）：

#!/bin/bash

while read name; do
    printf "%s=(\"%d\" \"%s\")\n" \
        "$name" \
        "$(grep -c "$name" "$1")" \
        "$(grep "$name" "$1" | tr $'\n' ' ' | sed 's/ /" "/g;s/" "$//')"
done < <(cut -d ',' -f 2 "$1" | sort -u)

示例输出：

$ ./SO.sh infile
jack=("1" "log3,jack,time,etc")
john=("1" "log1,john,time,etc")
peter=("2" "log2,peter,time,etc" "log4,peter,time,etc")

这使用进程替换来准备日志文件，以便我们可以遍历唯一的名称;替换的输出看起来像

$ cut -d ',' -f 2 "$1" | sort -u
jack
john
peter

，即唯一名称列表。

对于每个名称，我们然后使用

打印汇总日志行

printf "%s=(\"%d\" \"%s\")\n"

其中

%s字符串只是名称（"$name"）。
日志行计数是grep命令的输出，
```
grep -c "$name" "$1"
```
计算"$name"的出现次数。如果名称可以出现在日志行的其他位置，我们可以将搜索范围限制为日志行的第二个字段
```
grep -c "$name" <(cut -d ',' -f 2 "$1")
```
最后，要使用正确的引号和所有内容将所有日志行放在一行上，我们使用
```
grep "$name" "$1" | tr $'\n' ' ' | sed 's/ /" "/g;s/" "$//'
```
这将获取包含"$name"的所有行，用空格替换换行符，然后用引号括起空格，并从行尾删除多余的引号。

Pure Bash

在最初认为纯粹的Bash过于繁琐之后，事实证明并非那么复杂：

#!/bin/bash

declare -A count
declare -A lines

old_ifs=IFS
IFS=,
while read -r -a line; do
    name="${line[1]}"
    (( ++count[$name] ))
    lines[$name]+="\"${line[*]}\" "
done < "$1"

for name in "${!count[@]}"; do
    printf "%s=(\"%d\" %s)\n" "$name" "${count[$name]}" "${lines[$name]% }"
done

IFS="$old_ifs"

这会在循环输入文件时更新两个关联数组：count跟踪某个名称出现的次数，lines将日志行附加到每个名称的条目。

要用逗号分隔字段，我们将输入字段分隔符IFS设置为逗号（但事先将其保存，以便最后重置）。

read -r -a使用逗号分隔的字段将行读入数组line，因此名称现在位于${line[1]}中。我们在算术表达式(( ... ))中增加该名称的计数，并在下一行中追加（+=）日志行。

${line[*]}打印由IFS分隔的数组的所有字段，这正是我们想要的。我们还在这里添加一个空间;该行末尾的不需要的空格（在最后一个元素之后）将在以后删除。

第二个循环遍历count数组的所有键（名称），然后为每个键打印格式正确的行。 ${lines[$name]% }从行尾删除空格。

安全问题

由于看起来这些脚本的输出应该被shell重用，如果我们不能信任日志文件的内容，我们可能希望防止恶意代码执行。

对Bash解决方案（帽子提示：Charles Duffy）执行此操作的方法如下：for循环必须由

替换

for name in "${!count[@]}"; do
    IFS=' ' read -r -a words <<< "${lines[$name]}"
    printf -v words_str '%q ' "${words[@]}"
    printf "%q=(\"%d\" %s)\n" "$name" "${count[$name]}" "${words_str% }"
done

也就是说，我们将组合的日志行拆分为数组words，将%q格式化标志打印到字符串words_str中，然后将该字符串用于输出，结果在这样的转义输出中：

peter=("2" \"log2\,peter\,time\,etc\" \"log4\,peter\,time\,etc\")
jack=("1" \"log3\,jack\,time\,etc\")
john=("1" \"log1\,john\,time\,etc\")

可以为第一种解决方案做类似的事情。

Answer 3

您可以使用awk。作为演示：

awk -F, '{a1[$2]=a1[$2]" \""$0"\""; sum[$2]++} END{for (e in sum){print e"=("  "\""sum[e]"\""a1[e]")"}}' file
john=("1" "log1,john,time,etc")
peter=("2" "log2,peter,time,etc" "log4,peter,time,etc")
jack=("1" "log3,jack,time,etc")

动态间接Bash数组

3 个答案:

使用Coreutils，grep和sed进行Bash

Pure Bash