Question

我想使用名称的bash命令创建一个数组。我有一个包含此表单的文件，

<span class="username js-action-profile-name">@paulburt07</span>
<span class="username js-action-profile-name">@DavidWBrown7</span>
<span class="username js-action-profile-name">@MikeLarkan</span>
<span class="username js-action-profile-name">@WeathermanABC</span>
<span class="username js-action-profile-name">@JoshHoltTEN</span>
<span class="username js-action-profile-name">@TonyAuden</span>
<span class="username js-action-profile-name">@Magdalena_Roze</span>
<span class="username js-action-profile-name">@janesweather7</span>
<span class="username js-action-profile-name">@VanessaOHanlon</span>

我需要一个类似

的数组

array = ( "paulburt07" "DavidWBrown7" "MikeLarkan" "WeathermanABC" "JoshHolTEN" "TonyAuden" "Magdalena_Roze" "janesweahter7" "VansessaOHanlon" )

有什么想法吗？

Answer 1

许多可能的解决方案之一：

array=($(grep -oP '@\K(.*)(?=<)' file))

修改解释不多，grep在文件中搜索正则表达式定义的模式。（见man grep）。 -o仅打印匹配项，-P表示使用 perl-ish 正则表达式。

@\K(.*)(?=<)意味着：

搜索并匹配@

忘记了匹配\K，（但记住了位置）

匹配ant string (.*)

直到找到<

$(command)被称为命令替换，array=(...)将数值分配给数组。

<强> EDIT2 并且因为原始输入可能包含更多HTML标记，所以您可以使用HTML解析器，例如：

array=($(perl -Mojo -E 'say $_->text for x(b("filename.html")->slurp)->find(q{span[class~="username"]})->each'))

将在任何HTML中打印任何<span class=username>...</span>的内容，无论其格式如何。但是对于上述内容，您需要安装Mojolicious。

Answer 2

使用sed和tmp文件非常简单：

#!/bin/bash

fname=${1:-htmlnames.txt}           # original html file
tmp=${2:-htmltmp.txt}               # temp file to use

sed -e 's/.*@//' "$fname" > "$tmp"  # remove up to '@' and place in temp
sed -i 's/[<].*$//' "$tmp"          # remove remainder in place in temp
namearray=( $(<"$tmp") )            # read temp file into array
rm "$tmp"                           # remove temp file

for i in "${namearray[@]}"; do        # print out to verify
    printf " %s\n" "$i"
done

exit 0

<强>输出：

alchemy:~/scr/tmp/stack/tmp> bash htmlnames.sh
 paulburt07
 DavidWBrown7
 MikeLarkan
 WeathermanABC
 JoshHoltTEN
 TonyAuden
 Magdalena_Roze
 janesweather7
 VanessaOHanlon

从文件创建一个名称数组

2 个答案: