我想使用sed将data.txt转换为方案列表,格式如下:
- 每个具有相同起始编号的行都将被解析和组合,如下所示:
data.txt中
1,{},344.233
1,{2},344.197
2,{16},290.281
2,{18},289.093
3,{1},220.896
foo.scm
(define v1 '(1 (() 344.233) ((2) 344.197))) ;; this is for first two lines starting with 1
(define v2 '(2 ((16) 290.281) ((18) 289.093))) ;; ... 2
(define v3 '(3 (() 237.558))) ;; ... 3
答案 0 :(得分:3)
我对计划一无所知,所以我可能会用awk而不是sed来做这件事。
[ghoti@pc ~]$ cat data.txt
1,{},344.233
1,{2},344.197
2,{16},290.281
2,{18},289.093
3,{1},220.896
[ghoti@pc ~]$ cat doit.awk
#!/usr/bin/awk -f
BEGIN {
FS=",";
last1=1;
}
$1 != last1 {
printf("(define v%.0f '(%.0f %s))\n", last1, last1, substr(sect,2));
last1=$1; sect="";
}
{
gsub(/[^0-9]/,"",$2);
sect=sprintf("%s ((%s) %s)", sect, $2, $3);
}
END {
printf("(define v%.0f '(%.0f %s))\n", last1, last1, substr(sect,2));
}
[ghoti@pc ~]$ ./doit.awk data.txt
(define v1 '(1 (() 344.233) ((2) 344.197)))
(define v2 '(2 ((16) 290.281) ((18) 289.093)))
(define v3 '(3 ((1) 220.896)))
[ghoti@pc ~]$
它当然可以写得更紧密,但这可以完成工作。
更新:(根据评论)
[ghoti@pc ~]$ tail -1 data.txt
3,{1,3,4},220.896
[ghoti@pc ~]$ diff -u doit.awk doitnew.awk
--- doit.awk 2012-05-30 00:38:34.549680376 -0400
+++ doitnew.awk 2012-05-30 00:38:52.893810815 -0400
@@ -10,8 +10,15 @@
last1=$1; sect="";
}
+$2 !~ /}$/ {
+ while ($2 !~ /}$/) {
+ pos=match($0, /,[0-9,]+}/);
+ $0=substr($0, 0, pos-1) " " substr($0, pos+1);
+ }
+}
+
{
- gsub(/[^0-9]/,"",$2);
+ gsub(/[^0-9 ]/,"",$2);
sect=sprintf("%s ((%s) %s)", sect, $2, $3);
}
[ghoti@pc ~]$ ./doitnew.awk data.txt
(define v1 '(1 (() 344.233) ((2) 344.197)))
(define v2 '(2 ((16) 290.281) ((18) 289.093)))
(define v3 '(3 ((1 3 4) 220.896)))
[ghoti@pc ~]$
这里发生了什么?
在我们添加的新块中,测试第二个字段是否以}
结尾。如果没有,我们将循环直到它。对于循环的每次运行,我们将在}
之前删除一个逗号,用空格替换它。
有时,蛮力有效。 :-P
答案 1 :(得分:3)
球拍(a.k.a.计划):
#lang racket
;; parse a line (we will join them later)
(define (line-parse l)
(match (regexp-match #px"([0-9]+),\\{([0-9,]*)\\},([0-9.]+)" l)
[(list dc first-num bracket-nums rest)
(list (string->number first-num)
(match bracket-nums
["" empty]
[else (map string->number
(regexp-split #px"," bracket-nums))])
(string->number rest))]
[else
(error "unexpected line format in line: ~s\n" l)]))
;; join together lines that start with the same number
(define (join-lines lines)
(cond [(empty? lines) empty]
[else (join-lines-of-n (first (first lines))
lines
empty)]))
;; gather together lines starting with 'n':
(define (join-lines-of-n n lines accum)
(cond [(empty? lines)
(list (cons n (reverse accum)))]
[(equal? (first (first lines)) n)
(join-lines-of-n n (rest lines) (cons (rest (first lines))
accum))]
[else
(cons (cons n (reverse accum))
(join-lines lines))]))
(define (dress-up line)
(format "~a\n" `(define ,(format "v~s" (first line))
',line)))
(display
(apply
string-append
(map dress-up
(join-lines
(map line-parse
(sequence->list (in-port read-line)))))))
将其另存为rewrite.rkt,运行方式如下:
oiseau:/tmp clements> racket ./rewrite.rkt < foo.txt
(define v1 (quote (1 (() 344.233) ((2) 344.197))))
(define v2 (quote (2 ((16) 290.281) ((18) 289.093))))
(define v3 (quote (3 ((1) 220.896) ((4 5) 2387.278))))
...请注意,我在输入示例中添加了{4,5}行来测试您的扩展程序。
另外,请注意输出使用(引用...)而不是&#39;(...)。这&#34;应该可以正常工作&#34 ;;也就是说,Scheme阅读器为这两种形式生成相同的输出,并且生成的文件应该可以作为方案输入正常工作。
如果这是我的代码,我想我不会做(定义v1 ...)舞蹈,只是把这个东西写成一个计划/球拍计划可以淹没的大数据单个&#34;阅读&#34;,但这是你的选择,而不是我的选择。此外,您的规范中存在一些歧义:初始索引的唯一性;也就是说,你可能会回去&#34;到较早的行号。例如,给出此输入文件时输出应该是什么:
3,{1},1.0
4,{1},1.0
3,{1},1.0
另外,请注意我删除了所有测试用例,以使其看起来更短/更漂亮:)。
编辑:OH!以这种方式收集线条。它实际上有点慢,但读得更好:#lang racket
;; parse a line (we will join them later)
(define (line-parse l)
(match (regexp-match #px"([0-9]+),\\{([0-9,]*)\\},([0-9.]+)" l)
[(list dc first-num bracket-nums rest)
(list (string->number first-num)
(match bracket-nums
["" empty]
[else (map string->number
(regexp-split #px"," bracket-nums))])
(string->number rest))]
[else
(error "unexpected line format in line: ~s\n" l)]))
;; does the line start with the number k?
(define ((starts-with k) l) (equal? (first l) k))
;; join together lines starting with the same thing:
(define (join-lines lines)
(for/list ([k (remove-duplicates (map first lines))])
(cons k (map rest (filter (starts-with k) lines)))))
(define (dress-up line)
(format "~a\n" `(define ,(format "v~s" (first line))
',line)))
(display
(apply
string-append
(map dress-up
(join-lines
(map line-parse
(sequence->list (in-port read-line)))))))
答案 2 :(得分:0)
这可能适合你(GNU sed):
sed ':a;$!N;s/^\(\([^,])*\).*\)\n\2/\1/;ta;h;x;s/\n.*//;s/,{\([^}]*\)},\([^,]\+\)/ ((\1) \2)/g;s/,/ /g;s/^\([^ ]*\).*/(define v\1 '\''(&)) ;;...\1/p;x;D' file
说明:
:a;$!N;s/^\(\([^,])*\).*\)\n\2/\1/;ta
h
x
s/\n.*//
s/,{\([^}]*\)},\([^,]\+\)/ ((\1) \2)/g
,
替换为空格。 s/,/ /g
s/^\([^ ]*\).*/(define v\1 '\''(&)) ;;...\1/p
x
D