Am目前正在解析一些网站,以提高我的Unix Bash技能。 已提取以下格式的文件
la-que-no-podia-capitulo-1
la-que-no-podia-capitulo-25
la-que-no-podia-capitulo-30
并希望到达这一步
la-que-no-podia-capitulo-001
la-que-no-podia-capitulo-025
la-que-no-podia-capitulo-030
有人可以帮助我吗? 我尝试了不同的方法:
Bash RegExp
x='a-que-no-me-dejas-capitulo-10'
re='((([[:alpha:]]+(-))+)[[:digit:]]+)'
if [[ $x =~ $re ]]
then
echo The regex matches!
echo ${BASH_REMATCH[*]}
fi
(以利用https://stackoverflow.com/a/63551084/10906045)
但是不幸的是,它并没有分割最后一个数字。
AWK
awk -F'-' '{ printf "%04d: \n", $NF }' output_downloads >output_downloads2
head output_downloads2
0001:
0002:
0003:
0004:
0050:
我无法提取第一部分。
答案 0 :(得分:4)
使用awk
awk '{ match($0, /(.*-)([[:digit:]]+)$/, m); printf("%s%03d\n", m[1], m[2])}' inputfile
这是实际的awk脚本:
{
# Regex match whole line with 2 capture groups
match($0, /(.*-)([[:digit:]]+)$/, m)
# Format print both captured groups
printf("%s%03d\n", m[1], m[2])
}
使用Bash ERE:
while IFS= read -r || [[ $REPLY ]]; do
# Regex match whole line with 2 capture groups
[[ $REPLY =~ (.*-)([[:digit:]]+)$ ]] || :
# Format print both captured groups
printf '%s%03d\n' "${BASH_REMATCH[1]}" "${BASH_REMATCH[2]}"
done <inputfile
或使用POSIX shell:
#!/usr/bin/env sh
while IFS= read -r line || [ "$line" ]; do
IFS=-
# Split line on dashes and fills the arguments array
# shellcheck disable=SC2086 # Intended word splitting
set -- $line
# Format print arguments followed by dash except last one
while [ $# -gt 1 ]; do
printf '%s-' "$1"
shift
done
# Format print last argument as 0-padded, 3 digits integer and newline
printf '%03d\n' "$1"
done <inputfile