我有一个包含以下内容的文件:
( (CODE <begin_A_defense_of_Michael_Moore>))
( (NP (NP (NP (DT A) (NN defense))
(PP (IN of)
(NP (NP (NNP Michael) (NNP Moore))
(CC and)
(" ")
(S-NOM-TTL (NP-SBJ (-NONE- *PRO*))
(VP (VBG Bowling)
(PP-PRP (IN for)
(NP (NNP Columbine))))))))
(" ")
(CODE -LRB-)
(PRN (NP (NN Op-Ed)))
(CODE -RRB-)
(PP (IN By)
(NP (NNP Eloquence)))))
( (FRAG (NP (NNP Wed))
(NP (NML (NNP Aug))
(JJ 13th)
(, ,)
(NN 2003))
(PP-TMP (IN at)
(NP (CD 09:00:09)
(FW AM) (FW EST)))))
( (S (NP-SBJ (DT This))
(VP (VBZ is)
(NP-PRD (NP (DT an) (JJ open) (NN letter))
(PP (IN to)
(NP (NP (NNP David) (NNP Hardy))
(, ,)
(NP (NP (NN author))
(PP (IN of)
(NP (NP-TTL (S-NOM-TTL (NP-SBJ (-NONE- *PRO*))
(VP (VB Bowling)
(PP-PRP (IN for)
(NP (NNP Columbine)))))
(: :)
(NP (NN Documentary) (CC or) (NN Fiction)))
(, ?)
(, ,)
(RRC (ADVP (RB probably))
(NP-PRD (NP (DT the)
(ADJP (RBS most) (JJ comprehensive)))
(PP (IN among)
(NP (NP (JJ many) (NNS rebuttals))
(PP (IN of)
(NP (DT the)
(ADJP (NNP Oscar) (HYPH -) (VBG winning))
(NN documentary))))))))))))))
(. .)))
( (S (NP-SBJ (NNS Critics))
(VP (VBP have)
(ADVP-TMP (RB now))
(VP (VBN gone)
(ADVP (ADVP (RB so) (RB far))
(SBAR (IN as)
(S (NP-SBJ (-NONE- *PRO*))
(VP (TO to)
(VP (VB call)
(PP-CLR (IN for)
(NP (NP (DT the) (NN revocation))
(PP (IN of)
(NP (DT the) (NN award))))))))))))
(. .)))
( (S (NP-SBJ (PRP$ Their) (NNS chances))
(VP (VBP are)
(ADJP-PRD (JJ small))
(, ,)
(ADVP (RB however))
(, ,)
(SBAR-PRP (IN as)
(S (NP-SBJ (PRP$ their) (NNS arguments))
(VP (VP (VBP rely)
(PP-CLR=1 (IN on)
(NP (NN polemic) (, ,) (NN exaggeration) (CC and) (NN misrepresentation))))
(: --)
(VP (PP (IN in)
(NP (JJ other) (NNS words)))
(, ,)
(PP-CLR=1 (IN on)
(NP (NP (DT the) (JJ same) (NNS techniques))
(SBAR (WHNP-2 (WP which))
(S (NP-SBJ (PRP they))
(VP (VBP accuse)
(NP (NNP Moore))
(PP-CLR (IN of)
(S-NOM (NP-SBJ (-NONE- *PRO*))
(VP (VBG using)
(NP (-NONE- *T*-2)))))))))))))))
(. .)))
我需要单独进行每个特定的解析。我认为最好的方法是用新的空行拆分这个文件(有没有其他方法)。有没有人知道如何做到这一点?我正在使用PHP。 该文件来自MASC语料库。
感谢。
答案 0 :(得分:0)
我实际上是通过以下方式完成的:
$newfile= file("textfile.txt");
$temp_str='';
$parses=array();
foreach ($newfile as $line) {
$temp=trim($line);
if(strlen($temp)>0){
$temp_str.=$temp;
}
else{
array_push($parses, $temp_str);
$temp_str='';
}
}