我有一个这种形式的嵌套列表:
[[u' (SBAR - TMP (WHADVP-1 (WRB When)) (S (NP-SBJ (PRP it)))']
[u'(NP-SBJ (DT the) (NNS traders))']
[u'(NP (NNS orders) (S (-NONE- *ICH*-2)))']
[u'(PP-MNR (IN via) (NP (NNS computers)))']
[u'(S-2\n (NP-SBJ (-NONE- *))\n (VP\n (TO to)]]
我想删除标签和此输出:
((when it)(the traders)(orders)(via computers))
谁能告诉我如何在python中做到这一点?
答案 0 :(得分:0)
你可以得到一切不是大写的东西。我不知道你认为标签是什么,所以你可以从这些方面开始:
import re
arr = [[u' (SBAR - TMP (WHADVP-1 (WRB When)) (S (NP-SBJ (PRP it)))'],
[u'(NP-SBJ (DT the) (NNS traders))'],
[u'(NP (NNS orders) (S (-NONE- *ICH*-2)))'],
[u'(PP-MNR (IN via) (NP (NNS computers)))'],
[u'(S-2\n (NP-SBJ (-NONE- *))\n (VP\n (TO to)']]
res = [' '.join(re.findall(r'(\b[A-Za-z][a-z ]+\b)', s[0])) for s in arr]
print(res)
# [u'When it', u'the traders', u'orders', u'via computers', u'to']