我有以下lisp文件,来自UCI machine learning database。我想使用python将其转换为平面文本文件。典型的线条如下所示:
(1 ((st 8) (pitch 67) (dur 4) (keysig 1) (timesig 12) (fermata 0))((st 12) (pitch 67) (dur 8) (keysig 1) (timesig 12) (fermata 0)))
我想将其解析为文本文件,如:
time pitch duration keysig timesig fermata
8 67 4 1 12 0
12 67 8 1 12 0
是否有一个python模块可以智能地解析这个?这是我第一次看到lisp。
答案 0 :(得分:21)
如this answer所示,pyparsing似乎是正确的工具:
inputdata = '(1 ((st 8) (pitch 67) (dur 4) (keysig 1) (timesig 12) (fermata 0))((st 12) (pitch 67) (dur 8) (keysig 1) (timesig 12) (fermata 0)))'
from pyparsing import OneOrMore, nestedExpr
data = OneOrMore(nestedExpr()).parseString(inputdata)
print data
# [['1', [['st', '8'], ['pitch', '67'], ['dur', '4'], ['keysig', '1'], ['timesig', '12'], ['fermata', '0']], [['st', '12'], ['pitch', '67'], ['dur', '8'], ['keysig', '1'], ['timesig', '12'], ['fermata', '0']]]]
为了完整性,这是如何格式化结果(使用texttable):
from texttable import Texttable
tab = Texttable()
for row in data.asList()[0][1:]:
row = dict(row)
tab.header(row.keys())
tab.add_row(row.values())
print tab.draw()
+---------+--------+----+-------+-----+---------+ | timesig | keysig | st | pitch | dur | fermata | +=========+========+====+=======+=====+=========+ | 12 | 1 | 8 | 67 | 4 | 0 | +---------+--------+----+-------+-----+---------+ | 12 | 1 | 12 | 67 | 8 | 0 | +---------+--------+----+-------+-----+---------+
将该数据转换回lisp表示法:
def lisp(x):
return '(%s)' % ' '.join(lisp(y) for y in x) if isinstance(x, list) else x
d = lisp(d[0])
答案 1 :(得分:2)
如果您知道数据是正确的并且格式统一(初看起来似乎如此),并且如果您只需要这些数据并且不需要解决一般问题...那么为什么不只是更换每个非数字的空格,然后分裂?
import re
data = open("chorales.lisp").read().split("\n")
data = [re.sub("[^-0-9]+", " ", x) for x in data]
for L in data:
L = map(int, L.split())
i = 1 # first element is chorale number
while i < len(L):
st, pitch, dur, keysig, timesig, fermata = L[i:i+6]
i += 6
... your processing goes here ...
答案 2 :(得分:1)
用正则表达式将它分成几对:
In [1]: import re
In [2]: txt = '(((st 8) (pitch 67) (dur 4) (keysig 1) (timesig 12) (fermata 0))((st 12) (pitch 67) (dur 8) (keysig 1) (timesig 12) (fermata 0)))'
In [3]: [p.split() for p in re.findall('\w+\s+\d+', txt)]
Out[3]: [['st', '8'], ['pitch', '67'], ['dur', '4'], ['keysig', '1'], ['timesig', '12'], ['fermata', '0'], ['st', '12'], ['pitch', '67'], ['dur', '8'], ['keysig', '1'], ['timesig', '12'], ['fermata', '0']]
然后把它变成字典:
dct = {}
for p in data:
if not p[0] in dct.keys():
dct[p[0]] = [p[1]]
else:
dct[p[0]].append(p[1])
结果:
In [10]: dct
Out[10]: {'timesig': ['12', '12'], 'keysig': ['1', '1'], 'st': ['8', '12'], 'pitch': ['67', '67'], 'dur': ['4', '8'], 'fermata': ['0', '0']}
印刷:
print 'time pitch duration keysig timesig fermata'
for t in range(len(dct['st'])):
print dct['st'][t], dct['pitch'][t], dct['dur'][t],
print dct['keysig'][t], dct['timesig'][t], dct['fermata'][t]
正确的格式留作读者的练习......
答案 3 :(得分:0)
由于数据已经在Lisp中,请使用lisp本身:
(let ((input '(1 ((ST 8) (PITCH 67) (DUR 4) (KEYSIG 1) (TIMESIG 12) (FERMATA 0))
((ST 12) (PITCH 67) (DUR 8) (KEYSIG 1) (TIMESIG 12) (FERMATA 0)))))
(let ((row-headers (mapcar 'car (second input)))
(row-data (mapcar (lambda (row) (mapcar 'second row)) (cdr input))))
(format t "~{~A~^ ~}~%" row-headers)
(format t "~{~{~A~^ ~}~^ ~%~}" row-data)))