我在word文档中有以下文字:
这是一段:
1)这是第一个子弹
2)这是第二个子弹
我正在尝试获取文字1)
和2)
,但我没有成功:
foreach (var items in para)
{
int id = items.ParagraphProperties.NumberingProperties.NumberingId.Val;
int refval = items.ParagraphProperties.NumberingProperties.NumberingLevelReference.Val;
var runs = items .Descendants<Run>();
foreach (var run in runs)
{
var txts = run.Descendants<Text>();
foreach (var txt in txts)
{
}
}
}
访问这些值会为这两个项目符号提供以下内容:
claims.ParagraphProperties.NumberingProperties.NumberingId.Val
-> 2
claims.ParagraphProperties.NumberingProperties.NumberingLevelReference.Val
-> 0
答案 0 :(得分:2)
我想我刚被Dirk Vollmar打了个书呆子,所以现在我不得不尝试用一种方法来计算一个有序列表中的“文本”。
现在,这假设Word的英文版本的行为与我的丹麦版本的行为大致相同,不管怎样,经过测试后,我发现有3种不同的缩进级别。
第一级是数字,第二级是字母,第三级是罗马数字。之后,级别重复,因此第四级是一个数字等等。
这意味着,为了计算列表中应该是什么文本,我们只需要知道段落的位置,在缩进级别。
之后,我为段落写了一个扩展方法。没有任何错误处理,它假设您传递的是实际位于列表中的段落。
public static string GetIndentionTextFromParagraph(this Paragraph paragraph)
{
int numberingId = paragraph.ParagraphProperties.NumberingProperties.NumberingId.Val;
int numberingLevel = paragraph.ParagraphProperties.NumberingProperties.NumberingLevelReference.Val;
//isolate paragraphs with the correct numbering id and indention level
var paragraphsInList = paragraph.Parent.Descendants<Paragraph>().Where(p =>
p.ParagraphProperties != null &&
p.ParagraphProperties.NumberingProperties != null &&
p.ParagraphProperties.NumberingProperties.NumberingId.Val == numberingId &&
p.ParagraphProperties.NumberingProperties.NumberingLevelReference.Val == numberingLevel
).ToList();
//find position of paragraph in list
int paragraphPositionInLevelOfList = paragraphsInList.IndexOf(paragraph);
//boil the level down to always being between 0 and 2 so we can chose what kind of response we want to give
while (numberingLevel > 2)
{
numberingLevel = numberingLevel - 3;
}
if (numberingLevel == 0)
{
//return a number
return (paragraphPositionInLevelOfList + 1).ToString();
}
else if (numberingLevel == 1)
{
//return a letter
return "abcdefghijklmnopqrstuvwxyz"[paragraphPositionInLevelOfList].ToString();
}
else if (numberingLevel == 2)
{
//return roman
return ToRoman(paragraphPositionInLevelOfList + 1);
}
else return "unknown list configuration";
}
现在只有测试是否有效。您希望如何隔离段落取决于您自己。为了测试它,我只是用一些独特的文本来隔离它们。
using (var wordDoc = WordprocessingDocument.Open(@"C:\test\qtest\test.docx", true))
{
MainDocumentPart mainPart = wordDoc.MainDocumentPart;
var document = mainPart.Document;
Paragraph firstIndention = document.Descendants<Paragraph>().Where(i => i.InnerText.Contains("my number bullet 1")).First();
Paragraph secondIndention = document.Descendants<Paragraph>().Where(i => i.InnerText.Contains("letter bullet 2")).First();
Paragraph thirdIndention = document.Descendants<Paragraph>().Where(i => i.InnerText.Contains("third indention 2")).First();
Paragraph fourthIndention = document.Descendants<Paragraph>().Where(i => i.InnerText.Contains("And we are back to numbering, so we know the rules now")).First();
Console.WriteLine(firstIndention.GetIndentionTextFromParagraph());
Console.WriteLine(secondIndention.GetIndentionTextFromParagraph());
Console.WriteLine(thirdIndention.GetIndentionTextFromParagraph());
Console.WriteLine(fourthIndention.GetIndentionTextFromParagraph());
}
这将输出:1,b,II和1.
希望这有帮助。
我从Converting integers to roman numerals
复制了“ToRoman”功能static string ToRoman(int number)
{
if ((number < 0) || (number > 3999)) throw new ArgumentOutOfRangeException("insert value betwheen 1 and 3999");
if (number < 1) return string.Empty;
if (number >= 1000) return "M" + ToRoman(number - 1000);
if (number >= 900) return "CM" + ToRoman(number - 900);
if (number >= 500) return "D" + ToRoman(number - 500);
if (number >= 400) return "CD" + ToRoman(number - 400);
if (number >= 100) return "C" + ToRoman(number - 100);
if (number >= 90) return "XC" + ToRoman(number - 90);
if (number >= 50) return "L" + ToRoman(number - 50);
if (number >= 40) return "XL" + ToRoman(number - 40);
if (number >= 10) return "X" + ToRoman(number - 10);
if (number >= 9) return "IX" + ToRoman(number - 9);
if (number >= 5) return "V" + ToRoman(number - 5);
if (number >= 4) return "IV" + ToRoman(number - 4);
if (number >= 1) return "I" + ToRoman(number - 1);
throw new ArgumentOutOfRangeException("something bad happened");
}
答案 1 :(得分:1)
从您的代码中我假设您尝试使用Open XML SDK获取列表项文本(而不是使用Word互操作)。
如果您解压缩文档包并查看document.xml,您将看到列表项文本未存储在文档中。它是打开文档时应用程序计算的值。遗憾的是,没有简单的方法可以使用Open XML SDK获得价值。
如果您想知道列表项文本,基本上有两个选项:
function isRowExists( isRowInDatabase ) {
console.log(isRowInDatabase);
if (isRowInDatabase == true) {
alert('Already in database');
}
}
)