打印出以字母表中每个字母开头的文件中第一次出现的单词

时间:2017-11-20 20:09:33

标签: python bash awk grep

我有一个带有形容词列表A-Z

的文件

如何打印以A开头的第一个单词,然后是以B开头的第一个单词...一直到Z?
我认为grep可能就是这样。但是对其他人开放,awk,python ......其他。

一些示例输出:

$ cat adjectives.txt | head
Adamant: unyielding; a very hard substance
Adroit: clever, resourceful
Amatory: sexual
Animistic: quality of recurrence or reversion to earlier form
Antic: clownish, frolicsome
Arcadian: serene
Baleful: deadly, foreboding
Bellicose: quarrelsome (its synonym belligerent can also be a noun)
Bilious: unpleasant, peevish
Boorish: crude, insensitive

$ cat adjectives.txt | grep '^[ABCDE]' | head
Adamant: unyielding; a very hard substance
Adroit: clever, resourceful
Amatory: sexual
Animistic: quality of recurrence or reversion to earlier form
Antic: clownish, frolicsome
Arcadian: serene
Baleful: deadly, foreboding
Bellicose: quarrelsome (its synonym belligerent can also be a noun)
Bilious: unpleasant, peevish
Boorish: crude, insensitive

所以我的示例输出将是:

Adamant: unyielding; a very hard substance
Baleful: deadly, foreboding
...
Irksome: annoying
Jejune: dull, puerile
...
Wheedling: flattering
Zealous: eager, devoted

here

完整填写文件
$ cat adjectives.txt
Adamant: unyielding; a very hard substance
Adroit: clever, resourceful
Amatory: sexual
Animistic: quality of recurrence or reversion to earlier form
Antic: clownish, frolicsome
Arcadian: serene
Baleful: deadly, foreboding
Bellicose: quarrelsome (its synonym belligerent can also be a noun)
Bilious: unpleasant, peevish
Boorish: crude, insensitive
Calamitous: disastrous
Caustic: corrosive, sarcastic; a corrosive substance
Cerulean: sky blue
Comely: attractive
Concomitant: accompanying
Contumacious: rebellious
Corpulent: obese
Crapulous: immoderate in appetite
Defamatory: maliciously misrepresenting
Didactic: conveying information or moral instruction
Dilatory: causing delay, tardy
Dowdy: shabby, old-fashioned; an unkempt woman
Efficacious: producing a desired effect
Effulgent: brilliantly radiant
Egregious: conspicuous, flagrant
Endemic: prevalent, native, peculiar to an area
Equanimous: even, balanced
Execrable: wretched, detestable
Fastidious: meticulous, overly delicate
Feckless: weak, irresponsible
Fecund: prolific, inventive
Friable: brittle
Fulsome: abundant, overdone, effusive
Garrulous: wordy, talkative
Guileless: naive
Gustatory: having to do with taste or eating
Heuristic: learning through trial-and-error or problem solving
Histrionic: affected, theatrical
Hubristic: proud, excessively self-confident
Incendiary: inflammatory, spontaneously combustible, hot
Insidious: subtle, seductive, treacherous
Insolent: impudent, contemptuous
Intransigent: uncompromising
Inveterate: habitual, persistent
Invidious: resentful, envious, obnoxious
Irksome: annoying
Jejune: dull, puerile
Jocular: jesting, playful
Judicious: discreet
Lachrymose: tearful
Limpid: simple, transparent, serene
Loquacious: talkative
Luminous: clear, shining
Mannered: artificial, stilted
Mendacious: deceptive
Meretricious: whorish, superficially appealing, pretentious
Minatory: menacing
Mordant: biting, incisive, pungent
Munificent: lavish, generous
Nefarious: wicked
Noxious: harmful, corrupting
Obtuse: blunt, stupid
Parsimonious: frugal, restrained
Pendulous: suspended, indecisive
Pernicious: injurious, deadly
Pervasive: widespread
Petulant: rude, ill humored
Platitudinous: resembling or full of dull or banal comments
Precipitate: steep, speedy
Propitious: auspicious, advantageous, benevolent
Puckish: impish
Querulous: cranky, whining
Quiescent: inactive, untroublesome
Rebarbative: irritating, repellent
Recalcitrant: resistant, obstinate
Redolent: aromatic, evocative
Rhadamanthine: harshly strict
Risible: laughable
Ruminative: contemplative
Sagacious: wise, discerning
Salubrious: healthful
Sartorial: relating to attire, especially tailored fashions
Sclerotic: hardening
Serpentine: snake-like, winding, tempting or wily
Spasmodic: having to do with or resembling a spasm, excitable,
intermittent
Strident: harsh, discordant; obtrusively loud
Taciturn: closemouthed, reticent
Tenacious: persistent, cohesive,
Tremulous: nervous, trembling, timid, sensitive
Trenchant: sharp, penetrating, distinct
Turbulent: restless, tempestuous
Turgid: swollen, pompous
Ubiquitous: pervasive, widespread
Uxorious: inordinately affectionate or compliant with a wife
Verdant: green, unripe
Voluble: glib, given to speaking
Voracious: ravenous, insatiable
Wheedling: flattering
Withering: devastating
Zealous: eager, devoted

4 个答案:

答案 0 :(得分:8)

awk救援!

$ awk '!a[tolower(substr($0,1,1))]++' file

这为每个初始字符创建一个计数器,仅在计数为零(即第一个实例)时打印。 tolower()可以使其不区分大小写,如果不需要,您可以删除。 substr($0,1,1)从行中提取第一个字符。有一个隐式循环将对输入文件的所有行重复此操作。

稍微更改脚本

$ awk '++a[substr($0,1,1)]==2' file  

您可以获得第二条记录(如果存在)或使用<3代替==2前2条记录。

如果您的文件已经排序且案例一致,您可以选择更简单的脚本

$ uniq -w1 file

uniq命令提取比较值​​的第一个实例,此处仅限于第一个字符。因此,它将立即提取所有字母中的第一个。如果案例不一致,请添加-i忽略案例标记。

扫描文件一次就足够了,不需要多次扫描......

答案 1 :(得分:3)

Python版本:

import itertools

with open('adjectives.txt') as fp:
    # Group lines by first letter. If the lines weren't already sorted, 
    # you could replace fp with sorted(fp).
    groups = itertools.groupby(fp, key=lambda line: line[0])

    for first_letter, group in groups:
        print(next(group), end='')

答案 2 :(得分:1)

也许,用bash:

for i in {A..Z}; do grep -m1 ^$i adjectives.txt; done

答案 3 :(得分:0)

with open("adjectives.txt") as f:
    lines = f.readlines()

# get rid of trailing \n
lines = [x.strip() for x in lines] 

# stable sort
lines.sort(key = lambda s: s[0])

d = {}
for line in lines:
  key = line[0]
  # only the first occurence
  if not key in d:
    d[key] = line

for key in sorted(d.keys()):
  print(d[key])