我正在处理来自维基百科的大量文本,我想删除条目中包含的各种发音指南。例如,给出以下条目:
Sigmund Freud (/ˈfrɔɪd/ FROYD; German: [ˈziːkmʊnt ˈfʁɔʏt]; born Sigismund Schlomo Freud; 6 May 1856 – 23 September 1939) was an…
Plato (/ˈpleɪtoʊ/; Greek: Πλάτων Plátōn, pronounced [plá.tɔːn] in Classical Attic; 428/427 or 424/423 – 348/347 BC) was a…
Napoleon Bonaparte (/nəˈpoʊliən ˈboʊnəpɑːrt/; French: [napɔleɔ̃ bɔnapaʁt]; 15 August 1769 – 5 May 1821) was a…
Michael Faraday FRS (/ˈfæ.rəˌdeɪ/; 22 September 1791 – 25 August 1867) was an…
Martin Luther (/ˈluːθər/; German: [ˈmaɐ̯tiːn ˈlʊtɐ]; 10 November 1483 – 18 February 1546), O.S.A., was a…
Louis Pasteur (/ˈluːi pæˈstɜːr/, French: [lwi pastœʁ]; December 27, 1822 – September 28, 1895) was a…
理想情况下,我希望最终得到以下内容:
Sigmund Freud (born Sigismund Schlomo Freud; 6 May 1856 – 23 September 1939) was an…
Plato (428/427 or 424/423 – 348/347 BC) was a…
Napoleon Bonaparte (15 August 1769 – 5 May 1821) was a…
Michael Faraday FRS (22 September 1791 – 25 August 1867) was an…
Martin Luther (10 November 1483 – 18 February 1546), O.S.A., was a…
Louis Pasteur (December 27, 1822 – September 28, 1895) was a…
有没有一种程序化的方法呢?
答案 0 :(得分:2)
sed 解决方案:
sed 's|/[^/]*/[^,;]*[,;]\(.*\[[^][]*\][^;]*;\)* *||g' file
输出:
Sigmund Freud (born Sigismund Schlomo Freud; 6 May 1856 – 23 September 1939) was an…
Plato (428/427 or 424/423 – 348/347 BC) was a…
Napoleon Bonaparte (15 August 1769 – 5 May 1821) was a…
Michael Faraday FRS (22 September 1791 – 25 August 1867) was an…
Martin Luther (10 November 1483 – 18 February 1546), O.S.A., was a…
Louis Pasteur (December 27, 1822 – September 28, 1895) was a…
/[^/]*/[^,;]*[,;]
- 将发音部分/.../
与可选的后续字词[^,;]*
匹配,并以,
或;
结尾
\(.*\[[^][]*\][^;]*;\)*
- 匹配发音部分[...]
,其中包含可选字词(由.*
和[^;]*
提供)并以{结尾{1}}。所有这些匹配都标记为可选;