如何从文本文件中的每一行中提取子串?

时间:2017-03-30 15:58:38

标签: python parsing split substring text-files

我有一个以下格式的文本文件(1行):

[NN] ||| transplant ||| transplantation ||| PPDB2.0Score=5.24981 PPDB1.0Score=3.295900 -logp(LHS|e1)=0.18597 -logp(LHS|e2)=0.14031 -logp(e1|LHS)=11.83583 -logp(e1|e2)=1.80507 -logp(e1|e2,LHS)=1.46728 -logp(e2|LHS)=11.47593 -logp(e2|e1)=1.49083 -logp(e2|e1,LHS)=1.10738 AGigaSim=0.63439 Abstract=0 Adjacent=0 CharCountDiff=5 CharLogCR=0.40547 ContainsX=0 Equivalence=0.371472 Exclusion=0.000344 GlueRule=0 GoogleNgramSim=0.03067 Identity=0 Independent=0.078161 Lex(e1|e2)=9.64663 Lex(e2|e1)=59.48919 Lexical=1 LogCount=4.67283 MVLSASim=NA Monotonic=1 OtherRelated=0.372735 PhrasePenalty=1 RarityPenalty=0 ForwardEntailment=0.177287 SourceTerminalsButNoTarget=0 SourceWords=1 TargetComplexity=0.98821 TargetFormality=0.98464 TargetTerminalsButNoSource=0 TargetWords=1 UnalignedSource=0 UnalignedTarget=0 WordCountDiff=0 WordLenDiff=5.00000 WordLogCR=0 ||| 0-0 ||| OtherRelated

我想要的是提取transplanttransplantation。你会怎么做?对于|||分隔符之间的值,文本文件中的每一行的长度都不同。为了说明,这是第二个例子:

[VBZ] ||| reflects ||| understand ||| PPDB2.0Score=3.50769 PPDB1.0Score=21.844910 -logp(LHS|e1)=0.01251 -logp(LHS|e2)=10.87470 -logp(e1|LHS)=6.91653 -logp(e1|e2)=11.53225 -logp(e1|e2,LHS)=4.29729 -logp(e2|LHS)=16.55913 -logp(e2|e1)=10.31266 -logp(e2|e1,LHS)=13.93988 AGigaSim=0.54532 Abstract=0 Adjacent=0 CharCountDiff=2 CharLogCR=0.22314 ContainsX=0 Equivalence=0.006535 Exclusion=0.022332 GlueRule=0 GoogleNgramSim=0 Identity=0 Independent=0.456621 Lex(e1|e2)=62.90141 Lex(e2|e1)=62.90141 Lexical=1 LogCount=0 MVLSASim=NA Monotonic=1 OtherRelated=0.404562 PhrasePenalty=1 RarityPenalty=0.36788 ForwardEntailment=0.109950 SourceTerminalsButNoTarget=0 SourceWords=1 TargetComplexity=0.99354 TargetFormality=1.00000 TargetTerminalsButNoSource=0 TargetWords=1 UnalignedSource=0 UnalignedTarget=0 WordCountDiff=0 WordLenDiff=2.00000 WordLogCR=0 ||| 0-0 ||| Independent

此处的目标字词为reflectsunderstands

1 个答案:

答案 0 :(得分:2)

拆分' ||| '?

your_text.split(' ||| ')会为您提供一个由'分隔的元素列表||| '

所以

your_text.split(' ||| ')[1:3]会返回['reflects','understands']