我是一名语言学家(尝试对拉丁语进行一些数据挖掘),但对编程来说却是新手。
我有一个像这样的文件:
cerycium:cerycia
cessatio:cessatio
cessatione
cessicius:cessicia
cessio:cessio
cessione
cessionem
cessioni
我需要像这样组织:
cerycium:cerycia
cessatio:cessatio
cessatio:cessatione
cessicius:cessicia
cessio:cessio
cessio:cessione
cessio:cessionem
cessio:cessioni
有人可以提供一个scipt(bash,regexp,python,等等)可能会为我做这个吗?谢谢!
答案 0 :(得分:1)
awk 'BEGIN {FS = OFS = ":"} NF == 1 {gsub(/[[:space:]]/, ""); $2 = $1; $1 = root} {root = $1; print}' inputfile
假设第一行将包含两个字段。
答案 1 :(得分:1)
Dennis脚本的简化版:
awk -F: 'NF==2 {root=$1; print $1":"$2;} NF==1 {gsub(/\s+/,""); print root":"$1;}' a.txt
或匹配而非计数:
awk -F: '/:/ {root=$1; print $1":"$2;} /^\s+/ {gsub(/\s+/,"");print root":"$1;}' a.txt
答案 2 :(得分:0)
python:如果第一行有两个字段
with open('in.txt') as f:
lines=f.readlines()
for i,x in enumerate(lines):
if ':' in x:
lines[i]=x.strip()
else:
lines[i]=lines[i-1].split(':')[0]+':'+x.strip()
print("\n".join(lines))
<强>输出:强>
cerycium:cerycia
cessatio:cessatio
cessatio:cessatione
cessicius:cessicia
cessio:cessio
cessio:cessione
cessio:cessionem
cessio:cessioni
答案 3 :(得分:0)
在perl中尝试:文件名:process.pl
#!/bin/perl
use strict;
use warnings;
open (READ_FILE, "infile");
my @fcontent = <READ_FILE>;
close (READ_FILE);
our $prefix = "";
foreach(@fcontent) {
if(grep(/:/, $_)) {
my @tokens = split(":", $_);
$prefix = $tokens[0];
} else {
$_ =~ s/\s+//;
$_= "$prefix:$_";
}
print $_;
}
open (WRITE_FILE, ">outfile");
foreach(@fcontent) {
print WRITE_FILE $_;
}
close (WRITE_FILE);
在命令提示符下:
perl process.pl
然后打开outfile查看结果.. 我已简化了程序,主要是为了提高可读性,您可以根据需要稍后进行编辑。