我有一个类似的CSV文件:
"","LESCHELLES","","LESCHELLES"
"","SAINTE CROIX DE VERDON","","SAINTE CROIX DE VERDON"
"","SERRE CHEVALIER","","SERRE CHEVALIER"
"","SAINT JUST D'ARDECHE","","SAINT JUST D'ARDECHE"
"","NEUVILLE SUR VANNES","","NEUVILLE SUR VANNES"
"","ESCUEILLENS ET SAINT JUST","","ESCUEILLENS ET SAINT JUST"
"","PAS DES LANCIERS","","PAS DES LANCIERS"
"","PLAN DE CAMPAGNE","","PLAN DE CAMPAGNE"
我想以这种方式转换它:
"","Leschelles","","LESCHELLES"
"","Sainte Croix De Verdon","","SAINTE CROIX DE VERDON","STE CROIX DE VERDON","93"
"","Serre Chevalier","","SERRE CHEVALIER","SERRE CHEVALIER","93"
"","Saint Just D'Ardeche","","SAINT JUST D'ARDECHE"
"","Neuville Sur Vannes","","NEUVILLE SUR VANNES"
"","Escueillens Et Saint Just","","ESCUEILLENS ET SAINT JUST","ESCUEILLENS ET ST JUST","91"
"","Luc","","LUC"
"","Pas Des Lanciers","","PAS DES LANCIERS","PAS DES LANCIERS","93"
"","Plan De Campagne","","PLAN DE CAMPAGNE","PLAN DE CAMPAGNE","93"
这很好。更好的是:小写所有“整体”字词,例如de
,d'
,et
,sur
和des
。这会给:
"","Leschelles","","LESCHELLES"
"","Sainte Croix de Verdon","","SAINTE CROIX DE VERDON","STE CROIX DE VERDON","93"
"","Serre Chevalier","","SERRE CHEVALIER","SERRE CHEVALIER","93"
"","Saint Just d'Ardeche","","SAINT JUST D'ARDECHE"
"","Neuville sur Vannes","","NEUVILLE SUR VANNES"
"","Escueillens et Saint Just","","ESCUEILLENS ET SAINT JUST","ESCUEILLENS ET ST JUST","91"
"","Luc","","LUC"
"","Pas des Lanciers","","PAS DES LANCIERS","PAS DES LANCIERS","93"
"","Plan de Campagne","","PLAN DE CAMPAGNE","PLAN DE CAMPAGNE","93"
答案 0 :(得分:3)
Python有title()
:
返回字符串的标题版本,其中单词以a开头 大写字符,其余字符为小写。
该算法使用简单的与语言无关的单词定义 作为连续字母组。该定义适用于许多人 上下文但它意味着收缩和占有的撇号 形成单词边界,这可能不是理想的结果:
"they're bill's friends from the UK".title() "They'Re Bill'S Friends From The Uk"
可以构建撇号的解决方法 使用正则表达式:
import re
def titlecase(s):
return re.sub(r"[A-Za-z]+('[A-Za-z]+)?",
lambda mo: mo.group(0)[0].upper() +
mo.group(0)[1:].lower(),
s)
titlecase("they're bill's friends.") "They're Bill's Friends."
更新:这是法国问题的解决方案:
import re, sys
def titlecase(s):
return re.sub(r"[A-Za-z]+('[A-Za-z]+)?",
lambda mo: mo.group(0)[0].upper() +
mo.group(0)[1:].lower(),
s)
def french_parse(s):
p = re.compile(
r"( de la | sur | sous | la | de | les | du | le | au | aux | en | des | et )|(( d'| l')([a-z]+))",
re.IGNORECASE)
return p.sub(
lambda mo: mo.group().find("'")>0
and mo.group()[:mo.group().find("'")+1].lower() +
titlecase(mo.group()[mo.group().find("'")+1:])
or (mo.group(0)[0].upper() + mo.group(0)[1:].lower()),
s);
for line in sys.stdin:
s = line[20:len(line)-1]
p = s.find('"')
t = s[:p]
# Just output to show which names have been modified:
if french_parse( titlecase(t) ) != titlecase(t):
print '"' + french_parse( titlecase(t) ) + '"'
就像这样启动它:
python thepythonscript.py < file.csv
然后输出将是:
"Grenand les Sombernon"
"Touville sur Montfort"
"Fontenay en Vexin"
"Durfort Saint Martin de Sossenac"
"Monclar d'Armagnac"
"Ports sur Vienne"
"Saint Barthelemy de Beaurepaire"
"Saint Bernard du Touvet"
"Rosoy le Vieil"
答案 1 :(得分:1)
虽然您可以通过一些vim正则表达式魔法来解决这个问题,但我认为如果您使用自己喜欢的脚本语言解决问题会更容易,并使用{{1}通过vim管理所选文本命令。这是PHP中的一个(未经测试的)示例:
!
使该脚本可执行并将其存储在#!/usr/bin/env php
<?php
$specialWords = array('de', 'd\'', 'et', 'du', /* etc. */ );
foreach (file('php://stdin') as $ville) {
$line = ucwords($line);
foreach ($specialWords as $w) {
$line = preg_replace("/\\b$w\\b/i", $w, $line);
}
echo $line;
}
上的某个位置;然后从vim中选择一些文本,并使用PATH
转换(或只用:'<,'>! yourscript.php
作为整个缓冲区。)
答案 2 :(得分:0)
csv.vim ftplugin有助于处理CSV文件。虽然它没有直接提供“N列中的替代”功能,但它可能会让你接近它。至少你可以将列排列成整齐的块,然后对它应用简单的正则表达式或视觉块选择。
但我认为使用更适合操作CSV文件的不同工具链可能比在Vim中完全这样做更可取。这还取决于它是否是一次性任务,或者你经常这样做。
答案 3 :(得分:0)
这是一个oneliner vim命令。
%s/"[^"]*",\zs\("[^"]*"\)/\=substitute(substitute(submatch(0), '\<\(\a\)\(\a*\)\>', '\u\1\L\2', 'g'), '\c\<\(de\|d\|l\|sur\|le\|la\|en\|et\)\>', '\L&', 'g')
我希望前两个字段中没有双引号。
这个解决方案背后的想法是依靠:h :s\=
在第二个字段上执行一系列功能。一系列功能是:首先将每个单词更改为TitleCase,然后将所有 liants 放入小写。