Question

如何找到bash中字符串中没有重复的单词？我想知道是否有“本机”bash方式，或者我需要使用另一个命令行实用程序（如awk，sed，grep，...）。

例如，var1="thrice once twice twice thrice";。我需要一些能将“一次”分开的东西，因为它只出现一次（即没有重复）。

Answer 1

您可以在用空格分割字符串后使用sort，uniq：

tr ' ' '\n' <<< "$var1" | sort | uniq -u

这会为您的输入产生once。

（如果输入包含标点符号，您可能需要先将其删除，以避免出现意外结果。）

Answer 2

@ devnull的答案是更好的选择（简单性和可能的性能），但是如果你正在寻找 bash-only解决方案：

<强>注意事项：

使用关联数组，仅在 bash 4或更高版本中可用：
在输入单词列表中使用文字*将不起作用（但是其他类似glob的字符串也可以。）
使用多个空白字符正确处理多行输入和输入。在单词之间。

# Define the input word list.
# Bonus: multi-line input with multiple inter-word spaces.
var1=$'thrice   once twice twice thrice\ntwice again'

# Declare associative array.
declare -A wordCounts 

# Read all words and count the occurrence of each.
while read -r w; do
  [[ -n $w ]] && (( wordCounts[$w]+=1 ))
done <<<"${var1// /$'\n'}" # split input list into lines for easy parsing

# Output result.
# Note that the output list will NOT automatically be sorted, because the keys of an 
# associative array are not 'naturally sorted'; hence piping to `sort`.
echo "Words that only occur once in '$var1':"
echo "---"
for w in "${!wordCounts[@]}"; do
  (( wordCounts[$w] == 1 )) && echo "$w"
done | sort

# Expected output:
#   again
#   once

Answer 3

只是为了好玩，awk：

awk '{
    for (i=1; i<=NF; i++) c[$i]++
    for (word in c) if (c[word]==1) print word
}' <<< "$var1"

once

查找仅在字符串中出现一次的单词

3 个答案: