所以,有一个大文件,我必须使用bash shell脚本进行多次搜索。
文件是这样的:
TITLE and AUTHOR ETEXT NO.
Aspects of plant life; with special reference to the British flora, 56900
by Robert Lloyd Praeger
The Vicar of Morwenstow, by Sabine Baring-Gould 56899
[Subtitle: Being a Life of Robert Stephen Hawker, M.A.]
Raamatun tutkisteluja IV, mennessä Charles T. Russell 56898
[Subtitle: Harmagedonin taistelu]
[Language: Finnish]
Raamatun tutkisteluja III, mennessä Charles T. Russell 56897
[Subtitle: Tulkoon valtakuntasi]
[Language: Finnish]
Tom Thatcher's Fortune, by Horatio Alger, Jr. 56896
A Yankee Flier in the Far East, by Al Avery 56895
and George Rutherford Montgomery
[Illustrator: Paul Laune]
Nancy Brandon's Mystery, by Lillian Garis 56894
Nervous Ills, by Boris Sidis 56893
[Subtitle: Their Cause and Cure]
Pensées sans langage, par Francis Picabia 56892
[Language: French]
Helon's Pilgrimage to Jerusalem, Volume 2 of 2, by Frederick Strauss 56891
[Subtitle: A picture of Judaism, in the century
which preceded the advent of our Savior]
Fra Tommaso Campanella, Vol. 1, di Luigi Amabile 56890
[Subtitle: la sua congiura, i suoi processi e la sua pazzia]
[Language: Italian]
The Blue Star, by Fletcher Pratt 56889
Importanza e risultati degli incrociamenti in avicoltura, 56888
di Teodoro Pascal
[Language: Italian]
The Junior Classics, Volume 3: Tales from Greece and Rome, by Various 56887
~ ~ ~ ~ Posting Dates for the below eBooks: 1 Mar 2018 to 31 Mar 2018 ~ ~ ~ ~
TITLE and AUTHOR ETEXT NO.
The American Missionary, Volume 41, No. 1, January, 1887, by Various 56886
Morganin miljoonat, mennessä Sven Elvestad 56885
[Author a.k.a. Stein Riverton]
[Subtitle: Salapoliisiromaani]
[Language: Finnish]
"Trip to the Sunny South" in March, 1885, by L. S. D 56884
Balaam and His Master, by Joel Chandler Harris 56883
[Subtitle: and Other Sketches and Stories]
Susien saaliina, mennessä Jack London 56882
[Language: Finnish]
Forged Egyptian Antiquities, by T. G. Wakeling 56881
The Secret Doctrine, Vol. 3 of 4, by Helena Petrovna Blavatsky 56880
[Subtitle: Third Edition]
No Posting 56879
First love and other stories, by Iván Turgénieff 56878
现在我必须用etext no,作者姓名和标题来搜索它。
喜欢如果我通过电子邮件搜索no:like etext 56900: 它应该返回
Aspects of plant life; with special reference to the British flora, 56900
我是shell脚本新手。我只能读取文件。 有了这个:
#!/bin/sh
read -p 'string to search ' searchstring
grep --color searchstring GUTINDEX.ALL | #condition
我不知道应该用什么样的条件来搜索作者姓名或etext no ....
答案 0 :(得分:1)
正如其他人已经指出的那样,单独使用grep
并不是你真正接近这个问题的方法。使用Awk而不是grep
可以实现相当大的改进,但对于真实的生产系统,您可以将字段解析为关系数据库,并使用SQL进行搜索。使用数据库索引,搜索将比顺序扫描每个搜索的整个索引文件快得多。
但如果你只局限于grep
,这是一次快速而又肮脏的尝试。
author () { grep -E "(by|par|di|mennessä) $@" GUTINDEX.ALL; }
index () { grep " $@\$" GUTINDEX.ALL; }
title () { grep "^$@" GUTINDEX.ALL; }
这声明了三个shell函数,它们通过提供锚表达式(^
匹配行的开头,$
匹配行尾)或合适的上下文来搜索文件的不同部分。 / p>
将搜索表达式作为命令行参数而不是需要交互式输入通常是一个巨大的可用性改进。现在,您可以使用shell的历史记录机制来调用并可能编辑早期的搜索,并在这些简单的构建块之上构建新脚本。
(顺便说一下,“mennessä”在这里根本不是正确的芬兰本地化。我向Project Gutenberg报告了一个错误。)
答案 1 :(得分:0)
你可以从这样的事情开始,但正如@ tom-fenech所指出的那样,在没有结构化输入的情况下,它相当不可靠。
例如,作者姓名不一致地加前缀,有时出现在"字幕"下,很少出现在"作者"标签
#!/bin/bash
CATALOG=/tmp/s
function usage()
{
echo "Usage:"
echo "$0 [etext <key>] [author <id>]"
exit 1;
}
function process_etext()
{
local searchKey=$1
egrep "${searchKey}" ${CATALOG} | awk -F"${searchKey}" '{print $1}'
}
function process_author()
{
local searchKey=$1
egrep -b1 "${searchKey}" ${CATALOG} | egrep "[[:digit:]]{5}"
}
for key in "$@"
do
key="$1"
case $key in
etext|author)
process_${key} $2
shift; shift;
;;
*)
[ -z ${key} ] || usage
;;
esac
done