通过这个当前项目,我正在阅读1000-1700个XML文件并提取信息。我目前面临的问题是并非所有的XML都是干净的。有些缺少IEnumerable正在寻找的关键元素,或者数据只是空值。我的问题是我无法解释这些丢失或NULL值。我尝试过string.Empty
和string.IsNullOrEmpty
但智能感知并不喜欢它。我的思考过程如果元素缺失或返回null,则将这些值等于" NA"。我是不是以我的思维方式离开了基地?
private static IEnumerable<object[]> GetDocumentsData(string folderPath = @"filepath")
{
return Directory.GetFiles(folderPath, "*.xml")
.Select(XDocument.Load)
.SelectMany(file => file.Descendants().Where(e => e.Name.LocalName == "FilingLeadDocument")
.Concat(file.Descendants().Where(e => e.Name.LocalName == "FilingConnectedDocument")))
.Select(documentNode =>
{
try
{
var receivedDateNode = documentNode.Elements().FirstOrDefault(e => e.Name.LocalName == "DocumentReceivedDate");
var SequenceNode = documentNode.Elements().FirstOrDefault(e => e.Name.LocalName == "DocumentSequenceID");
var descriptionNode = documentNode.Elements().FirstOrDefault(e => e.Name.LocalName == "DocumentDescriptionText");
var metadataNode = documentNode.Elements().FirstOrDefault(e => e.Name.LocalName == "DocumentMetadata");
var registerActionNode = metadataNode.Elements().FirstOrDefault(e => e.Name.LocalName == "RegisterActionDescriptionText");
return new object[]
{
(string)documentNode.Parent.Parent.Elements().FirstOrDefault(e => e.Name.LocalName == "DocumentIdentification"),
SequenceNode != null ? SequenceNode.Value.Trim() : string.Empty,
(DateTime?)receivedDateNode.Elements().FirstOrDefault(e => e.Name.LocalName == "DateTime"),
descriptionNode != null ? descriptionNode.Value.Trim() : string.Empty,
registerActionNode != null ? registerActionNode.Value.Trim() : string.Empty
};
}
catch (Exception e)
{
//Log.error("");
return new object[] { };
}
}).ToArray();
}
XML示例(XML中缺少RegisterActionDescriptionText
元素)
<?xml version="1.0"?>
<RecordFilingRequest xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns="urn:oasis:names:tc:legalxml-courtfiling:wsdl:WebServiceMessagingProfile-Definitions-4.0">
<RecordFilingRequestMessage xmlns:fsrsp="urn:oasis:names:tc:legalxml-courtfiling:schema:xsd:FilingStatusResponseMessage-4.0" xmlns:ecf="urn:oasis:names:tc:legalxml-courtfiling:schema:xsd:CommonTypes-4.0" xmlns:j="http://niem.gov/niem/domains/jxdm/4.0" xmlns:juvenile="urn:oasis:names:tc:legalxml-courtfiling:schema:xsd:JuvenileCase-4.0" xmlns:niem-xsd="http://niem.gov/niem/proxy/xsd/2.0" xmlns:domestic="urn:oasis:names:tc:legalxml-courtfiling:schema:xsd:DomesticCase-4.0" xmlns:s="http://niem.gov/niem/structures/2.0" xmlns:criminal="urn:oasis:names:tc:legalxml-courtfiling:schema:xsd:CriminalCase-4.0" xmlns:amcadext="http://www.amcad.com/NiemEcf/extensions/1.0" xmlns:i="http://niem.gov/niem/appinfo/2.0" xmlns:appellate="urn:oasis:names:tc:legalxml-courtfiling:schema:xsd:AppellateCase-4.0" xmlns:nc="http://niem.gov/niem/niem-core/2.0" xmlns:citation="urn:oasis:names:tc:legalxml-courtfiling:schema:xsd:CitationCase-4.0" xmlns:reviewcb="urn:oasis:names:tc:legalxml-courtfiling:schema:xsd:ReviewFilingCallbackMessage-4.0" xmlns:civil="urn:oasis:names:tc:legalxml-courtfiling:schema:xsd:CivilCase-4.0">
<nc:DocumentDescriptionText s:id="ReviewWorkQueueId">484</nc:DocumentDescriptionText>
<nc:DocumentDescriptionText s:id="ReviewWorkQueue">Criminal Traffic Existing Cases</nc:DocumentDescriptionText>
<nc:DocumentIdentification>
<nc:IdentificationID>14115049</nc:IdentificationID>
</nc:DocumentIdentification>
<nc:DocumentPostDate>
<nc:DateTime>2014-05-28T10:17:05.229345-04:00</nc:DateTime>
</nc:DocumentPostDate>
<nc:DocumentSubmitter>
<ecf:EntityPerson s:id="REVIEWER">
<nc:PersonName>
<nc:PersonGivenName>re</nc:PersonGivenName>
<nc:PersonSurName>re</nc:PersonSurName>
<nc:PersonFullName>re</nc:PersonFullName>
</nc:PersonName>
<nc:PersonOtherIdentification>
<nc:IdentificationID>51201</nc:IdentificationID>
<nc:IdentificationCategoryText>FLEPORTAL</nc:IdentificationCategoryText>
</nc:PersonOtherIdentification>
<nc:PersonOtherIdentification>
<nc:IdentificationID>re</nc:IdentificationID>
<nc:IdentificationCategoryText>FLEPORTAL_LOGONNAME</nc:IdentificationCategoryText>
</nc:PersonOtherIdentification>
<ecf:PersonAugmentation>
<nc:ContactInformation>
<nc:ContactEmailID>re</nc:ContactEmailID>
<nc:ContactMailingAddress>
<nc:StructuredAddress>
<nc:AddressDeliveryPointText>re</nc:AddressDeliveryPointText>
<nc:LocationCityName>re</nc:LocationCityName>
<nc:LocationStateUSPostalServiceCode>FL</nc:LocationStateUSPostalServiceCode>
<nc:LocationStateName>FL</nc:LocationStateName>
</nc:StructuredAddress>
<nc:AddressFullText>re</nc:AddressFullText>
</nc:ContactMailingAddress>
</nc:ContactInformation>
</ecf:PersonAugmentation>
</ecf:EntityPerson>
</nc:DocumentSubmitter>
<ecf:SendingMDELocationID>
<nc:IdentificationID>Filing Review MDE</nc:IdentificationID>
</ecf:SendingMDELocationID>
<ecf:SendingMDEProfileCode>urn:oasis:names:tc:legalxml-courtfiling:schema:xsd:WebServicesMessaging-2.0</ecf:SendingMDEProfileCode>
<CoreFilingMessage xmlns="urn:oasis:names:tc:legalxml-courtfiling:schema:xsd:CoreFilingMessage-4.0">
<nc:DocumentEffectiveDate>
<nc:DateTime>2014-05-28T08:00:00-04:00</nc:DateTime>
</nc:DocumentEffectiveDate>
<nc:DocumentIdentification>
<nc:IdentificationID>14115049</nc:IdentificationID>
<nc:IdentificationCategoryText>FLEPORTAL_FILING_ID</nc:IdentificationCategoryText>
</nc:DocumentIdentification>
<nc:DocumentInformationCutOffDate>
<nc:DateTime>2014-05-27T17:50:51.297-04:00</nc:DateTime>
</nc:DocumentInformationCutOffDate>
<nc:DocumentPostDate>
<nc:DateTime>2014-05-27T18:45:13.8464904-04:00</nc:DateTime>
</nc:DocumentPostDate>
<nc:DocumentReceivedDate>
<nc:DateTime>2014-05-27T17:50:51.297-04:00</nc:DateTime>
</nc:DocumentReceivedDate>
<ecf:SendingMDELocationID>
<nc:IdentificationID>URL/UNIQUE IDENTIFIER OF APPLICATION SENDING THIS REQUEST</nc:IdentificationID>
<nc:IdentificationCategoryText>FLEPORTAL</nc:IdentificationCategoryText>
</ecf:SendingMDELocationID>
<ecf:SendingMDEProfileCode>urn:oasis:names:tc:legalxml-courtfiling:schema:xsd:WebServicesMessaging-2.0</ecf:SendingMDEProfileCode>
<criminal:Case>
<nc:ActivityDescriptionText s:id="Criminal Traffic">re</nc:ActivityDescriptionText>
<nc:CaseTitleText>re</nc:CaseTitleText>
<nc:CaseCategoryText s:id="40781916535">831</nc:CaseCategoryText>
<nc:CaseTrackingID>052014CT</nc:CaseTrackingID>
<nc:CaseTrackingID s:id="ucn">052014CT</nc:CaseTrackingID>
<j:CaseAugmentation>
<j:CaseCourt>
<nc:OrganizationIdentification>
<nc:IdentificationID>05</nc:IdentificationID>
<nc:IdentificationCategoryText>FLEPORTAL_ORGANIZATION</nc:IdentificationCategoryText>
</nc:OrganizationIdentification>
<nc:OrganizationIdentification>
<nc:IdentificationID>28</nc:IdentificationID>
<nc:IdentificationCategoryText>FLEPORTAL_ORGANIZATION_UNIT</nc:IdentificationCategoryText>
</nc:OrganizationIdentification>
<nc:OrganizationIdentification>
<nc:IdentificationID>Trial</nc:IdentificationID>
<nc:IdentificationCategoryText>COURT_TYPE</nc:IdentificationCategoryText>
</nc:OrganizationIdentification>
<nc:OrganizationIdentification>
<nc:IdentificationID>Eighteenth Circuit</nc:IdentificationID>
<nc:IdentificationCategoryText>JUDICIAL_CIRCUIT_ID</nc:IdentificationCategoryText>
</nc:OrganizationIdentification>
<nc:OrganizationName>re</nc:OrganizationName>
<nc:OrganizationUnitName>Criminal Traffic</nc:OrganizationUnitName>
<j:CourtName>Criminal Traffic</j:CourtName>
</j:CaseCourt>
</j:CaseAugmentation>
</criminal:Case>
<FilingLeadDocument s:id="DOC00001">
<nc:DocumentApplicationName>application/pdf</nc:DocumentApplicationName>
<nc:DocumentDescriptionText>CLASS EMPTY-CS-AAADH6K-CE- 1AAADH6K</nc:DocumentDescriptionText>
<nc:DocumentDescriptionText s:id="DocumentGroup">MOTIONS</nc:DocumentDescriptionText>
<nc:DocumentDescriptionText s:id="DocumentType">MOTION TO SUPRESS </nc:DocumentDescriptionText>
<nc:DocumentFileControlID s:id="FileInputId">101</nc:DocumentFileControlID>
<nc:DocumentFileControlID s:id="Rule6PublicAnswer">-1</nc:DocumentFileControlID>
<nc:DocumentFileControlID s:id="Rule6ConfidentialAnswer">-1</nc:DocumentFileControlID>
<nc:DocumentFileControlID s:id="TypeOfConfidentialDocument">-1</nc:DocumentFileControlID>
<nc:DocumentPostDate>
<nc:DateTime>2014-05-27T18:45:13.8464904-04:00</nc:DateTime>
</nc:DocumentPostDate>
<nc:DocumentReceivedDate>
<nc:DateTime>2014-05-27T17:50:51.297-04:00</nc:DateTime>
</nc:DocumentReceivedDate>
<nc:DocumentSequenceID>1</nc:DocumentSequenceID>
<ecf:DocumentRendition>
<ecf:DocumentRenditionMetadata>
<nc:DocumentApplicationName>application/pdf</nc:DocumentApplicationName>
<nc:DocumentFileControlID>Class EMPTY-CS-AAADH6K-CE- 1AAADH6K.PDF</nc:DocumentFileControlID>
<ecf:DocumentAttachment s:id="ATT00001">
<nc:BinaryBase64Object>removed by RB </nc:BinaryBase64Object>
<nc:BinarySizeValue>101864</nc:BinarySizeValue>
<ecf:AttachmentSequenceID>1</ecf:AttachmentSequenceID>
</ecf:DocumentAttachment>
</ecf:DocumentRenditionMetadata>
</ecf:DocumentRendition>
</FilingLeadDocument>
</CoreFilingMessage>
</RecordFilingRequestMessage>
</RecordFilingRequest>
答案 0 :(得分:1)
假设你要求这个我错了:
return new object[]
{
(string)documentNode.Parent.Parent.Elements().FirstOrDefault(e => e.Name.LocalName == "DocumentIdentification"),
SequenceNode != null ? SequenceNode.Value.Trim() : "NA",
(DateTime?)receivedDateNode.Elements().FirstOrDefault(e => e.Name.LocalName == "DateTime"),
descriptionNode != null ? descriptionNode.Value.Trim() : "NA",
registerActionNode != null ? registerActionNode.Value.Trim() : "NA"
};
答案 1 :(得分:1)
为了解析这个xml,你必须在按名称搜索元素时处理命名空间。所以,你应该做的第一件事就是获得你需要的命名空间:
var xdoc = XDocument.Load(fileName);
var ns = xdoc.Root.GetDefaultNamespace();
XNamespace ecf = "urn:oasis:names:tc:legalxml-courtfiling:schema:xsd:CommonTypes-4.0";
XNamespace nc = "http://niem.gov/niem/niem-core/2.0";
XNamespace s = "http://niem.gov/niem/structures/2.0";
XNamespace cfm = "urn:oasis:names:tc:legalxml-courtfiling:schema:xsd:CoreFilingMessage-4.0";
接下来你需要的是将元素的全名传递给Linq方法而不是按本地名称搜索(记住,Linq方法接受元素的XName
而不是简单的字符串)。因此,如果您需要FilingLeadDocument
及其内容
var fld = xdoc.Root.Element(ns + "RecordFilingRequestMessage")
.Element(cfm + "CoreFilingMessage")
.Element(cfm + "FilingLeadDocument");
接下来会读取可能缺失的元素的值。如果您使用Value
属性,那么您将获得NullReferenceException
。所以,你应该使用元素的转换。例如。如果找不到元素,则转换为字符串将返回元素值或null
。
var fillingLeadDoc = new FillingLeadDoc {
ReceivedDate = (DateTime?)fld.Elements(nc + "DocumentReceivedDate")
.Elements(nc + "DateTime").FirstOrDefault(),
SequenceId = (int?)fld.Element(nc + "DocumentSequenceID"),
DescriptionText = (string)fld.Element(nc + "DocumentDescriptionText")
};
FillingLeadDoc
是哪个
public class FillingLeadDoc
{
public int? SequenceId { get; set; }
public DateTime? ReceivedDate { get; set; }
public string DescriptionText { get; set; }
}
对于您的样本,将创建以下实例:
{
SequenceId: 1,
ReceivedDate: "2014-05-28T00:50:51.297+03:00",
DescriptionText: "CLASS EMPTY-CS-AAADH6K-CE- 1AAADH6K"
}
将此代码移至某个GetFillingLeadDoc
方法:
private static FillingLeadDoc GetFillingLeadDoc(string fileName)
{
// code above
return fillingLeadDoc;
}
并为每个文件调用此方法:
private static IEnumerable<FillingLeadDoc> GetDocumentsData(
string folderPath = @"filepath")
{
return Directory.GetFiles(folderPath, "*.xml")
.Select(GetFillingLeadDoc);
}
然后将网格绑定到这些文档。