查找,复制和替换正则表达式或其他?

时间:2012-05-12 06:03:57

标签: python regex bash nlp

我是一名语言学家(尝试对拉丁语进行一些数据挖掘),但对编程来说却是新手。

我有一个像这样的文件:

cerycium:cerycia
cessatio:cessatio
    cessatione
cessicius:cessicia
cessio:cessio
    cessione
    cessionem
    cessioni

我需要像这样组织:

cerycium:cerycia
cessatio:cessatio
cessatio:cessatione
cessicius:cessicia
cessio:cessio
cessio:cessione
cessio:cessionem
cessio:cessioni

有人可以提供一个scipt(bash,regexp,python,等等)可能会为我做这个吗?谢谢!

4 个答案:

答案 0 :(得分:1)

awk 'BEGIN {FS = OFS = ":"} NF == 1 {gsub(/[[:space:]]/, ""); $2 = $1; $1 = root} {root = $1; print}' inputfile

假设第一行将包含两个字段。

答案 1 :(得分:1)

Dennis脚本的简化版:

awk -F: 'NF==2 {root=$1; print $1":"$2;} NF==1 {gsub(/\s+/,""); print root":"$1;}' a.txt

或匹配而非计数:

awk -F: '/:/ {root=$1; print $1":"$2;} /^\s+/ {gsub(/\s+/,"");print root":"$1;}' a.txt

答案 2 :(得分:0)

python:如果第一行有两个字段

with open('in.txt') as f:
    lines=f.readlines()
for i,x in enumerate(lines):
    if ':' in x:
        lines[i]=x.strip()
    else:
         lines[i]=lines[i-1].split(':')[0]+':'+x.strip()

print("\n".join(lines))

<强>输出:

    cerycium:cerycia
    cessatio:cessatio
    cessatio:cessatione
    cessicius:cessicia
    cessio:cessio
    cessio:cessione
    cessio:cessionem
    cessio:cessioni

答案 3 :(得分:0)

在perl中尝试:文件名:process.pl

#!/bin/perl

use strict;
use warnings;

open (READ_FILE, "infile");
my @fcontent = <READ_FILE>;
close (READ_FILE);

our $prefix = ""; 
foreach(@fcontent) {
    if(grep(/:/, $_)) {
        my @tokens = split(":", $_);
        $prefix = $tokens[0];
    } else {
        $_ =~ s/\s+//;
        $_= "$prefix:$_";
    }
    print $_;
}

open (WRITE_FILE, ">outfile");
foreach(@fcontent) {
    print WRITE_FILE $_;
}
close (WRITE_FILE);

在命令提示符下:

perl process.pl 

然后打开outfile查看结果.. 我已简化了程序,主要是为了提高可读性,您可以根据需要稍后进行编辑。