所以我试图根据从我拥有的XDocument中获得的信息制作一个新文件
XDocument doc = XDocument.Load(@"path.to.x.document");
StreamWriter sw = new StreamWriter(WriteFile);
var variabila = (from x in doc.Descendants("sentence").Elements("word")
select new
{
lemma = x.Attribute("lemma")?.Value,
postag = x.Attribute("postag")?.Value
}).ToSOMETHING; //Here i need to store it to something so later I can use that something like this
Something引理对于不同的postag可以具有相同的值,而postag对于不同的引理可以具有相同的值,如下所示:
lemma="somf" postag="S321"
lemma="areq" postag="O213"
lemma="somf" postag="O213"
lemma="werid" postag="S321"
所以我需要像这样写文件。基本上,如果它是句子的结尾,它将换行。
if(SOMETHING.lemma == "." || SOMETHING.lemma == "!")
{
sw.WriteLine(SOMETHING.lemma);
}
else
{
sw.Write(SOMETHING.lemma + " " + SOMETHING.postag);
}
我已经尝试过使用Lookup和Dictionary,但是如果我使用Dictionary,则会遇到异常,因为Dictionary无法两次存储相同的键,并且Lookup避免了该异常,但是我需要按照它们出现的顺序排列它们以形成句子基于引理和postag仅在新文件中。
答案 0 :(得分:0)
如果我对您的理解正确,那么作为示例,您将具有以下xml:
<root>
<sentence>
<word lemma="somf" postag="S321" />
<word lemma="areq" postag="O213" />
<word lemma="somf" postag="O213" />
<word lemma="werid" postag="S321" />
<word lemma="." postag="" />
</sentence>
<sentence>
<word lemma="areq" postag="O213" />
<word lemma="somf" postag="S321" />
<word lemma="werid" postag="S321" />
<word lemma="somf" postag="O213" />
<word lemma="." postag="" />
</sentence>
</root>
您希望文件写为:
somf S321 areq O213 somf O213 werid S321.
areq O213 somf S321 werid S321 somf O213.
请注意,我假设您将.
或!
作为最后一个元素,但是您可以根据需要使用它。
然后,您只需迭代每个sentence
及其word
(see the fiddle):
using(StreamWriter stream = new StreamWriter("result.txt"))
{
XDocument doc = XDocument.Load(@"path.to.x.document");
var sentences = doc.Descendants("sentence");
foreach (var sentence in sentences)
{
var line = string.Empty;
var words = sentence.Elements("word");
var lastWord = words.LastOrDefault();
foreach (var word in words.Take(words.Count()-1))
{
line = string.join(
" ",
line,
word.Attribute("lemma").Value,
word.Attribute("postag").Value
);
}
line = string.Join(string.Empty, line, lastWord.Attribute("lemma").Value);
stream.WriteLine(line);
}
}