将动态JSON数据高效反序列化为数据表

时间:2017-03-22 18:57:06

标签: c# json vb.net serialization json.net

我们正在开发一个程序来从一组服务器中获取幻灯片图像数据,这些服务器没有一致的架构设置(我担心它无效,但我不够精通这个调用)。作为独立无关的研究人员,我们对服务器没有影响力。

数据是通过一系列形式(n> 50)手动输入(大部分),具有不一致的字段(数据返回到90年代)。以下是回复示例:

{
"form12873": [

    {
        "id": "9202075838",
        "timestamp": "2015-06-25 10:24:51",
        "user_agent": "Mozilla\/5.0 (Macintosh; Intel Mac OS X 10_10_3) AppleWebKit\/600.6.3 (KHTML, like Gecko) Version\/8.0.6 Safari\/600.6.3",
        "remote_addr": "[Re.dact.ed]",
        "processed": "1",
        "data": {
            "33885124": {
                "field": "33885124",
                "value": "CDat Lab",
                "flat_value": "CDat Lab",
                "label": "Completed by:",
                "type": "select"
            },

            ''**Several more fields as above**''...

            "33884660": {
                "field": "33884660",
                "value": {
                    "slideX": "2456123",
                    "slideY": "456632",
                    "label": "K-20150322148",
                    "approved": "1",
                    "score": "30144"
                },
                "flat_value": "slideX = 2456123\nslideY = 456632\nlabel = K-20150322148\napproved = 1\nscore = 30144",
                "label": "Slide Stats:",
                "type": "slidestats"
            },

            ''**Some of the fields are as above...

            "31970564": {
                "field": "31970564",
                "value": [
                    "System",
                    "Crated",
                    "Mirax",
                    "NanoZoomer",
                    "ThinPrep",
                    "Aperio",
                    "Intellisite"

                ],
                "flat_value": "System\nCrated\nMirax\nNanoZoomer\nThinPrep\nAperio\nIntellisite",
                "label": "System Information",
                "type": "checkbox"
            },

            ''**Some of the values are Arrays...

            "33883781": {
                "field": "33883781",
                "selection": "Retain",
                "label": "4. Retain\/Remove\/Review",
                "type": "selectdrop"
            },

            ''**Some of the fields don't have the same children

            "52792890": {
                "field": "52792890",
                "image": "'A really large byte[], removed for ease of reading'",
                "type": "image"
            }

            ''**Somewhere near the end of each response is the actual image...
        }
    },

    {
        "id": "33884681",
            ''**Then it continues on as above until the end:
    }
], "total": 170, "pages": 5, "pretty_id": "478125624983" }

过去当我能够model/class for the structure of the JSON时,我已经知道如何处理它(创建一个定义了字段,值等的数据类)。

尝试以下解决方案:

var result = JsonConvert.DeserializeObject<List<Dictionary<string, 
                            Dictionary<string, string>>>>(content);

始终导致数组错误或转换问题(即使添加了直接强制转换)。我能够得到实际的first array using

    Public Shared Function Tabulate(json As String) As DataTable
    Dim jsonLinq = Newtonsoft.Json.Linq.JObject.Parse(json)

    ' Find the first array using Linq

    Dim srcArray = jsonLinq.Descendants().Where(Function(d) TypeOf d Is JArray).First()
    Dim trgArray = New Newtonsoft.Json.Linq.JArray()
    For Each row As JObject In srcArray.Children(Of JObject)()
        Dim cleanRow = New JObject()
        For Each column As JProperty In row.Properties()
            ' Only include JValue types
            If TypeOf column.Value Is JValue Then
                cleanRow.Add(column.Name, column.Value)
            End If
        Next

        trgArray.Add(cleanRow)
    Next


    Return JsonConvert.DeserializeObject(Of DataTable)(trgArray.ToString())
End Function

我的最终目标也是获取数据表,并且循环/图像字节让我担心尝试逐步向更多的孩子求助。然后我尝试使用第一个数组进行反序列化,然后才会出现。

如果有快速处理方法,我会喜欢这个解决方案。如果问题是我正在尝试处理垃圾JSON,我会喜欢参考当前标准被破坏的地方(所以我至少可以尝试让其他机构改变他们的服务器)。也就是说,无论如何,我可能不得不处理它,即使它是循环的。

*注意:该项目是在VB.net中启动的,所以我们保持这种方式,但我可能决定移植到C#。两者中的代码都很棒。

以下是应该可用于测试的Json的未标记示例。我的最终目标是将其扁平化为数据表:

{
"form12873": [
    {
        "id": "9202075838",
        "timestamp": "2015-06-25 10:24:51",
        "user_agent": "Mozilla\/5.0 (Macintosh; Intel Mac OS X 10_10_3) AppleWebKit\/600.6.3 (KHTML, like Gecko) Version\/8.0.6 Safari\/600.6.3",
        "remote_addr": "[Re.dact.ed]",
        "processed": "1",
        "data": {
            "33885124": {
                "field": "33885124",
                "value": "CDat Lab",
                "flat_value": "CDat Lab",
                "label": "Completed by:",
                "type": "select"
            },
            "33884660": {
                "field": "33884660",
                "value": {
                    "slideX": "2456123",
                    "slideY": "456632",
                    "label": "K-20150322148",
                    "approved": "1",
                    "score": "30144"
                },
                "flat_value": "slideX = 2456123\nslideY = 456632\nlabel = K-20150322148\napproved = 1\nscore = 30144",
                "label": "Slide Stats:",
                "type": "slidestats"
            },
            "31970564": {
                "field": "31970564",
                "value": [
                    "System",
                    "Crated",
                    "Mirax",
                    "NanoZoomer",
                    "ThinPrep",
                    "Aperio",
                    "Intellisite"
                ],
                "flat_value": "System\nCrated\nMirax\nNanoZoomer\nThinPrep\nAperio\nIntellisite",
                "label": "System Information",
                "type": "checkbox"
            },



            "33883781": {
                "field": "33883781",
                "selection": "Retain",
                "label": "4. Retain\/Remove\/Review",
                "type": "select"
            }
        }
    }
], "total": 170, "pages": 5, "pretty_id": "478125624983" }

2 个答案:

答案 0 :(得分:1)

即使DataColumns已包含DataTable,也可以DataRows添加DataTable

我没有做太多JSON,但我对狡猾的XML的一般方法是分解为键值对的流,其中键是XPATH“地址”,值是节点的内容(不包括子节点)节点),然后遍历流以构建class text: def __init__(self, size, message, color, position, button = False, action = None): self.size = size self.message = message self.color = color self.position = position self.text_size = pygame.font.SysFont(None, int(size*displaywidth)) self.Textsurface = self.text_size.render(self.message, True, self.color) Textrect = self.Textsurface.get_rect() self.Textrect = Textrect self.Textwidth = Textrect[2] self.Textheight = Textrect[3] self.second_x_pos = Textrect[2] + position[0] self.second_y_pos = Textrect[3] + position[1] self.button = button self.action = action def display(self): self.Textrect.topleft = (self.position) gameWindow.blit(self.Textsurface, self.Textrect) if self.button == True: self.Textrect.topleft = (self.position) gameWindow.blit(self.Textsurface, self.Textrect) for event in pygame.event.get(): if event.type == pygame.MOUSEMOTION: if self.position[0] < event.pos[0] < self.second_x_pos and self.position[1] < event.pos[1] < self.second_y_pos: print("yee") self.color = white self.Textsurface = self.text_size.render(self.message, True, self.color) gameWindow.blit(self.Textsurface, self.Textrect) else: self.Textsurface = self.text_size.render(self.message, True, self.color) if event.type == pygame.MOUSEBUTTONUP : self.action() # menu screen def menu_screen(): global wine global purple menu = True global displaywidth global displayheight global gameWindow global compltely_red global brown global red # Texts menu_txt = text(0.2,"Timm", red, (displaywidth/2,displayheight/9)) Play_txt = text(0.04, "Play ", wine, (displaywidth/7, displayheight/1.5), True, game_loop) parallel_button = displaywidth - (displaywidth/7) - Play_txt.Textwidth Quit_txt = text(0.04, "Quit ", compltely_red, (parallel_button, displayheight/1.5), True, quit_Everything) #loop while menu == True: #the loop for event in pygame.event.get(): if event.type == pygame.QUIT: quit_Everything() if event.type == pygame.KEYDOWN: if event.key == pygame.K_ESCAPE: quit_Everything() if event.key == pygame.K_f: displaywidth = 1920 displayheight = 1080 gameWindow = pygame.display.set_mode((displaywidth,displayheight), pygame.FULLSCREEN) if event.key == pygame.K_g: displaywidth = 960 displayheight = 960 gameWindow = pygame.display.set_mode((displaywidth,displayheight)) gameWindow.fill(green) menu_txt.display() Play_txt.display() Quit_txt.display() pygame.display.update() 。也许这里可以采用类似的方法使用JSONPath。

答案 1 :(得分:1)

下面的丑陋的装置能够(粗略地)做你想要的。将json源字符串作为参数提供给DeserializeToDataTable并收集结果数据表。它适用于您的样本。我无法保证它可以在其余数据中使用。这里的目的是提供一个工作启动工具包,您可以学习,理解,调试和适应您的需求。

Private Function DeserializeToDataTable(ByVal jsource As String)
    Dim JRootObject = JObject.Parse(jsource)
    Dim Children = JRootObject.SelectTokens("$..data.*").ToArray
    Dim Records = Children.OfType(Of JObject).ToArray
    Dim dicList As New List(Of Dictionary(Of String, Object))
    For Each rec In Records
        dicList.Add(DeserializeToDictionary(rec))
    Next
    Dim fieldnames = dicList.SelectMany(Function(d) d.Keys).Distinct.ToArray
    Dim dt As New DataTable
    For Each fieldname In fieldnames
        dt.Columns.Add(fieldname, GetType(Object))
    Next
    Dim row As DataRow
    For Each dic In dicList
        row = dt.NewRow
        For Each kvp In dic
            row.SetField(kvp.Key, kvp.Value)
        Next
        dt.Rows.Add(row)
    Next
    Return dt
End Function

Private Function DeserializeToDictionary(ByVal json_object As JObject) As Dictionary(Of String, Object)
    Dim dic = New Dictionary(Of String, Object)
    For Each field In json_object.Properties
        Select Case field.Value.Type
            Case JTokenType.Array
                Dim subobject = New JObject
                Dim item = 0
                For Each token In field.Value
                    subobject("item" & item) = token
                    item += 1
                Next
                Dim subdic = DeserializeToDictionary(subobject)
                For Each kvp In subdic
                    dic(kvp.Key) = kvp.Value
                Next
            Case JTokenType.Boolean
                dic(field.Name) = field.Value.ToObject(Of Boolean)
            Case JTokenType.Bytes
                dic(field.Name) = field.Value.ToObject(Of Byte())
            Case JTokenType.Date
                dic(field.Name) = field.Value.ToObject(Of Date)
            Case JTokenType.Float
                dic(field.Name) = field.Value.ToObject(Of Double)
            Case JTokenType.Guid
                dic(field.Name) = field.Value.ToObject(Of Guid)
            Case JTokenType.Integer
                dic(field.Name) = field.Value.ToObject(Of Integer)
            Case JTokenType.Object
                Dim subdic = DeserializeToDictionary(field.Value)
                For Each kvp In subdic
                    dic(kvp.Key) = kvp.Value
                Next
            Case JTokenType.String
                Try
                    dic(field.Name) = field.Value.ToObject(Of String)
                Catch ex As Exception
                    dic(field.Name) = field.Value.ToObject(Of Object)
                End Try
            Case JTokenType.TimeSpan
                dic(field.Name) = field.Value.ToObject(Of TimeSpan)
            Case Else
                dic(field.Name) = field.Value.ToString
        End Select
    Next
    Return dic
End Function

使用上述代码时必须注意这一点:

  1. 它使用递归来展平多分支结构。所以,

    {
        "A":"aaaa",
        "B":"bbbb",
        "C":{
                "D":"dddd",
                "E":"eeee",
                "F":"ffff"
            }
        }
    }
    

    将成为

    A   |B   |D   |E   |F
    ----+----+----+----+----
    aaaa|bbbb|dddd|eeee|ffff
    
  2. 我所采取的方式假设在展平时不会重复;如果有那些,它将保留最后一个。所以,

    {
        "A":"aaaa",
        "B":"bbbb",
        "C":{
                "D":"d1d1",
                "E":"e1e1",
                "F":"f1f1"
            },
        "G":{
                "D":"d2d2",
                "E":"e2e2",
                "F":"f2f2"
            }
        }
    }
    

    将成为

    A   |B   |D   |E   |F
    ----+----+----+----+----
    aaaa|bbbb|d2d2|e2e2|f2f2
    

    这是一个明显有缺陷的错误行为,需要一种更复杂的方法,我留给你建立我的划痕。