使用mongodb连接器从mongodb读取数据时如何修复架构

时间:2019-08-08 10:27:39

标签: mongodb pyspark

这是我在databricks单元上运行的代码,用于从mongodb读取数据。

locus_task = spark.read.format("com.mongodb.spark.sql.DefaultSource").option("database", "locus").option("collection", "tasks").option("samplingRatio", 0.9).load()

但是出现此错误:

  

scala.MatchError:SkipFieldType(类com.mongodb.spark.sql.types.SkipFieldType $)

以前它可以工作,但是现在给出错误。我正在使用mongodb连接器从mongodb中读取数据。数据具有复杂的嵌套结构,因此无法定义手动方案,因为它具有近20000个字段。 来自mongo的原始数据是:

{"_id":"5b7be4cd1f939e743c2c3a1a","customer":{"name":"Rixyz","mobileNumber":"99xxxxxxxx","address":"104, Suprema F, XYZ"},"warehouse":{"address":"XYZ - FNA","contacts":[{"name":"Pxyz","email":"pxyz@xyz.com","mobileNumber":85xxxxxxxx}],"name":"XYZ-Fna","ref":"5b7ba92bdd39411895f1db8c"},"isAdhoc":false,"status":"COMPLETED","reassigned":false,"orderId":"200237832","goodsType":"F_A","lineItems":[{"id":"508590","name":"Poise Queen Bed","sku":"Bxxxx","price":11900}],"volume":"15","taskDateTime":"2018-08-22T09:30:00.000Z","taskType":"DELIVERY","city":"XYZ","lat":"89.21168","lng":"93.0907","feedbackToken":"7c68d977-b693-4d7c-a4d0-9427469d4f28","id":"FnA-200237832-1534846157768","statusChangeLog":[{"actor":{"id":"xyz/personnel/dxyz"},"_id":"5b7d69d6ee7ec9416e2c675b","status":"RECEIVED","triggerTime":"2018-08-21T10:09:18.560Z"},{"actor":{"id":"locus"},"_id":"5b7d69d6ee7ec980ac2c675a","status":"WAITING","triggerTime":"2018-08-22T05:56:07.842Z","assignedUser":{"carrierClientId":"xyz","userId":"Zxyz-1534826008383"}},{"actor":{"id":"xyz/personnel/dxyz"},"_id":"5b7d69d6ee7ec9676f2c6759","status":"ACCEPTED","triggerTime":"2018-08-22T05:56:09.034Z","receiveTime":"2018-08-22T05:56:09.707Z","assignedUser":{"carrierClientId":"xyz","userId":"Zxyz-1534826008383"}},{"actor":{"id":"xyz/personnel/mxyz"},"_id":"5b7d69d6ee7ec9875e2c6758","status":"COMPLETED","triggerTime":"2018-08-22T13:49:09.020Z","receiveTime":"2018-08-22T13:49:09.023Z","location":{"lat":89.01511,"lng":93.02275,"accuracy":0,"timestamp":1534943025243,"distance":0},"assignedUser":{"carrierClientId":"xyz","userId":"Zxyz-1534826008383"}}],"assignedUserLog":[{"_id":"5b7cfaf91f939ec7b62c3b57","userId":"xyz-1534826008383","timestamp":"2018-08-22T05:56:08.171Z"}],"triggerTime":"2018-08-22T13:49:09.109Z","createdAt":"2018-08-21T10:09:17.770Z","updatedAt":"2018-08-22T13:49:10.018Z","__v":1,"batch":{"id":"xyz-FnA-1534846051825","ref":"5b7be463ee7ec91c172c5e02"},"team":{"name":"xyz-Fna","ref":"5b7bb6a078d4a56582dfef97"},"visits":{"WAREHOUSE_VISIT":{"clientId":"xyz","taskId":"FnA-200237832-1534846157768","id":"WAREHOUSE_VISIT","volumes":{"volumes":[{"unit":"ITEM_COUNT","value":"15","exchangeType":"COLLECT"}]},"resources":{"resources":[]},"visitStatus":{"status":"COMPLETED","triggerTime":"2018-08-22T13:49:09.020+0000","checklistValues":{},"receiveTime":"2018-08-22T13:49:09.023+0000","location":{"lat":89.01511,"lng":93.02275,"accuracy":0,"timestamp":1534943025243,"distance":0},"actor":{"id":"xyz/personnel/xyz"},"assignedUser":{"carrierClientId":"xyz","userId":"xyz-1534826008383"}},"statusUpdates":[{"status":"RECEIVED","triggerTime":"2018-08-21T10:09:18.560+0000","checklistValues":{},"actor":{"id":"xyz/personnel/xyz"}},{"status":"WAITING","triggerTime":"2018-08-22T05:56:07.842+0000","checklistValues":{},"actor":{"id":"locus"},"assignedUser":{"carrierClientId":"xyz","userId":"Zxyz-1534826008383"}},{"status":"ACCEPTED","triggerTime":"2018-08-22T05:56:09.034+0000","checklistValues":{},"receiveTime":"2018-08-22T05:56:09.707+0000","actor":{"id":"xyz/personnel/dxyz"},"assignedUser":{"carrierClientId":"xyz","userId":"xyz-1534826008383"}},{"status":"COMPLETED","triggerTime":"2018-08-22T13:49:09.020+0000","checklistValues":{},"receiveTime":"2018-08-22T13:49:09.023+0000","location":{"lat":89.01511,"lng":93.02275,"accuracy":0,"timestamp":1534943025243,"distance":0},"actor":{"id":"xyz/personnel/xyz"},"assignedUser":{"carrierClientId":"xyz","userId":"xyz-1534826008383"}}],"locationOptions":[{"id":"xyz-Fna","geometry":{"latLng":{"lat":89.126407,"lng":92.893228,"accuracy":0}},"timeWindow":{"slot":{"start":"2018-08-22T00:30:00.000+0000","end":"2018-08-22T17:29:59.999+0000"},"strictness":"STRICT","canTransactBeforeSlot":false,"canTransactAfterSlot":true,"treatEtaAsSla":false,"transactionDuration":2700,"readinessDuration":0,"slotBuffer":0,"slots":[{"start":"2018-08-22T00:30:00.000+0000","end":"2018-08-22T17:29:59.999+0000"}]},"nonAvailableWindows":[],"locationAddress":{"formattedAddress":"xyz - FNA","countryCode":"UN"},"contact":{"name":"Pxyz","number":"8xxxxxxxxx"},"geocodingMetadata":{"provider":"CLIENT_READ","archive":[],"goodness":"HIGH","confidence":"HIGH","placeNameArchive":[],"localityArchive":[]},"customerId":{"clientId":"xyz","customerId":"ea2fad024df6418786258ef296cffcba"},"addressId":{"clientId":"xyz","addressId":"ba467d66030a4a20bbecc37986c0dd0c"}}],"chosenLocation":{"id":"xyz-Fna","geometry":{"latLng":{"lat":89.126407,"lng":92.893228,"accuracy":0}},"timeWindow":{"slot":{"start":"2018-08-22T00:30:00.000+0000","end":"2018-08-22T17:29:59.999+0000"},"strictness":"STRICT","canTransactBeforeSlot":false,"canTransactAfterSlot":true,"treatEtaAsSla":false,"transactionDuration":2700,"readinessDuration":0,"slotBuffer":0,"slots":[{"start":"2018-08-22T00:30:00.000+0000","end":"2018-08-22T17:29:59.999+0000"}]},"nonAvailableWindows":[],"locationAddress":{"formattedAddress":"xyz - FNA","countryCode":"UN"},"contact":{"name":"Pxyz","number":"8xxxxxxxx"},"geocodingMetadata":{"provider":"CLIENT_READ","archive":[],"goodness":"HIGH","confidence":"HIGH","placeNameArchive":[],"localityArchive":[]},"customerId":{"clientId":"xyz","customerId":"ea2fad024df6418786258ef296cffcba"},"addressId":{"clientId":"xyz","addressId":"ba467d66030a4a20bbecc37986c0dd0c"}},"eta":{"COMPLETED":{"initialEta":{"arrivalTime":"2018-08-22T05:47:00.000+0000","estimatedOn":"2018-08-22T05:54:37.932+0000"},"currentEta":{"arrivalTime":"2018-08-22T16:42:29.738+0000","estimatedOn":"2018-08-22T13:43:22.738+0000"}},"ACCEPTED":{"initialEta":{"arrivalTime":"2018-08-22T05:02:00.000+0000","estimatedOn":"2018-08-22T05:54:37.932+0000"},"currentEta":{"arrivalTime":"2018-08-22T05:56:09.034+0000","estimatedOn":"2018-08-22T13:43:22.738+0000"}},"ARRIVED":{"initialEta":{"arrivalTime":"2018-08-22T05:02:00.000+0000","estimatedOn":"2018-08-22T05:54:37.932+0000"},"currentEta":{"arrivalTime":"2018-08-22T15:12:29.738+0000","estimatedOn":"2018-08-22T13:43:22.738+0000"}},"STARTED":{"initialEta":{"arrivalTime":"2018-08-22T05:02:00.000+0000","estimatedOn":"2018-08-22T05:54:37.932+0000"},"currentEta":{"arrivalTime":"2018-08-22T13:43:22.738+0000","estimatedOn":"2018-08-22T13:43:22.738+0000"}},"TRANSACTING":{"initialEta":{"arrivalTime":"2018-08-22T05:02:00.000+0000","estimatedOn":"2018-08-22T05:54:37.932+0000"},"currentEta":{"arrivalTime":"2018-08-22T15:57:29.738+0000","estimatedOn":"2018-08-22T13:43:22.738+0000"}}},"checklists":[],"task":true,"payments":{"paymentInstruments":[],"payments":[],"fullAmountRequired":false},"visitMetadata":{"type":"CUSTOMER"},"slotEdits":[],"geofences":[],"routes":{"routes":[]},"summary":{"tardiness":0},"triggeredGeofences":[],"orderDetail":{"lineItems":{"508590":{"name":"Poise Queen Bed","quantity":1,"id":"508590","price":{"amount":1190,"currency":"INR","symbol":"₹"}}},"transactionDetail":{"canTransactPartial":true}},"appFields":{"items":[]},"tags":{"tags":[]},"visitAppConfig":{"skipStatuses":[]}},"CLIENT_VISIT":{"clientId":"xyz","taskId":"FnA-200237832-1534846157768","id":"CLIENT_VISIT","volumes":{"volumes":[{"unit":"ITEM_COUNT","value":"15","exchangeType":"GIVE"}]},"resources":{"resources":[]},"visitStatus":{"status":"COMPLETED","triggerTime":"2018-08-22T13:49:09.020+0000","checklistValues":{},"receiveTime":"2018-08-22T13:49:09.023+0000","location":{"lat":89.01511,"lng":93.02275,"accuracy":0,"timestamp":1534943025243,"distance":0},"actor":{"id":"xyz/personnel/xyz"},"assignedUser":{"carrierClientId":"xyz","userId":"Zxyz-1534826008383"}},"statusUpdates":[{"status":"RECEIVED","triggerTime":"2018-08-21T10:09:18.560+0000","checklistValues":{},"actor":{"id":"xyz/personnel/xyz"}},{"status":"WAITING","triggerTime":"2018-08-22T05:56:07.842+0000","checklistValues":{},"actor":{"id":"locus"},"assignedUser":{"carrierClientId":"xyz","userId":"xyz-1534826008383"}},{"status":"ACCEPTED","triggerTime":"2018-08-22T05:56:09.034+0000","checklistValues":{},"receiveTime":"2018-08-22T05:56:09.707+0000","actor":{"id":"xyz/personnel/dxyz"},"assignedUser":{"carrierClientId":"xyz","userId":"Zxyz-1534826008383"}},{"status":"COMPLETED","triggerTime":"2018-08-22T13:49:09.020+0000","checklistValues":{},"receiveTime":"2018-08-22T13:49:09.023+0000","location":{"lat":89.01511,"lng":93.02275,"accuracy":0,"timestamp":1534943025243,"distance":0},"actor":{"id":"xyz/personnel/xyz"},"assignedUser":{"carrierClientId":"xyz","userId":"Zxyzi-1534826008383"}}],"locationOptions":[{"id":"Client-Visit","geometry":{"latLng":{"lat":89.21168,"lng":93.0907,"accuracy":0}},"timeWindow":{"slot":{"start":"2018-08-22T09:30:00.000+0000","end":"2018-08-22T10:15:00.000+0000"},"strictness":"STRICT","canTransactBeforeSlot":false,"canTransactAfterSlot":true,"treatEtaAsSla":false,"transactionDuration":2700,"readinessDuration":0,"slotBuffer":0,"slots":[{"start":"2018-08-22T09:30:00.000+0000","end":"2018-08-22T10:15:00.000+0000"}]},"nonAvailableWindows":[],"locationAddress":{"formattedAddress":"104, Suprema F, Lodha Casa Bella, 104,xyz","countryCode":"un"},"contact":{"name":"Rxyz","number":"9xxxxxxxx"},"geocodingMetadata":{"provider":"CLIENT_READ","archive":[],"goodness":"HIGH","confidence":"HIGH","placeNameArchive":[],"localityArchive":[]},"customerId":{"clientId":"xyz","customerId":"119a272e7ee54414a1b4b506359527d2"},"addressId":{"clientId":"xyz","addressId":"b11e6cffa55a4991aa623fc2dbd2771e"}}],"chosenLocation":{"id":"Client-Visit","geometry":{"latLng":{"lat":89.21168,"lng":93.0907,"accuracy":0}},"timeWindow":{"slot":{"start":"2018-08-22T09:30:00.000+0000","end":"2018-08-22T10:15:00.000+0000"},"strictness":"STRICT","canTransactBeforeSlot":false,"canTransactAfterSlot":true,"treatEtaAsSla":false,"transactionDuration":2700,"readinessDuration":0,"slotBuffer":0,"slots":[{"start":"2018-08-22T09:30:00.000+0000","end":"2018-08-22T10:15:00.000+0000"}]},"nonAvailableWindows":[],"locationAddress":{"formattedAddress":"104, Suprema F,XYZ","countryCode":"UN"},"contact":{"name":"Rxyz","number":"9xxxxxxxxx"},"geocodingMetadata":{"provider":"CLIENT_READ","archive":[],"goodness":"HIGH","confidence":"HIGH","placeNameArchive":[],"localityArchive":[]},"customerId":{"clientId":"xyz","customerId":"119a272e7ee54414a1b4b506359527d2"},"addressId":{"clientId":"xyz","addressId":"b11e6cffa55a4991aa623fc2dbd2771e"}},"eta":{"COMPLETED":{"initialEta":{"arrivalTime":"2018-08-22T10:15:00.000+0000","estimatedOn":"2018-08-22T05:54:37.932+0000"},"currentEta":{"arrivalTime":"2018-08-22T19:16:52.738+0000","estimatedOn":"2018-08-22T13:43:22.738+0000"}},"ACCEPTED":{"initialEta":{"arrivalTime":"2018-08-22T06:32:00.000+0000","estimatedOn":"2018-08-22T05:54:37.932+0000"},"currentEta":{"arrivalTime":"2018-08-22T05:56:09.034+0000","estimatedOn":"2018-08-22T13:43:22.738+0000"}},"ARRIVED":{"initialEta":{"arrivalTime":"2018-08-22T09:30:00.000+0000","estimatedOn":"2018-08-22T05:54:37.932+0000"},"currentEta":{"arrivalTime":"2018-08-22T18:31:52.738+0000","estimatedOn":"2018-08-22T13:43:22.738+0000"}},"STARTED":{"initialEta":{"arrivalTime":"2018-08-22T06:32:00.000+0000","estimatedOn":"2018-08-22T05:54:37.932+0000"},"currentEta":{"arrivalTime":"2018-08-22T16:42:29.738+0000","estimatedOn":"2018-08-22T13:43:22.738+0000"}},"TRANSACTING":{"initialEta":{"arrivalTime":"2018-08-22T09:30:00.000+0000","estimatedOn":"2018-08-22T05:54:37.932+0000"},"currentEta":{"arrivalTime":"2018-08-22T18:31:52.738+0000","estimatedOn":"2018-08-22T13:43:22.738+0000"}}},"checklists":[],"task":true,"payments":{"paymentInstruments":[],"payments":[],"fullAmountRequired":false},"visitMetadata":{"type":"CUSTOMER"},"slotEdits":[],"geofences":[],"routes":{"routes":[]},"summary":{"tardiness":12849},"triggeredGeofences":[],"orderDetail":{"lineItems":{"508590":{"name":"Poise Queen Bed","quantity":1,"id":"508590","price":{"amount":1190,"currency":"US Dollar","symbol":"$"}}},"transactionDetail":{"canTransactPartial":true}},"appFields":{"items":[]},"tags":{"tags":[]},"visitAppConfig":{"skipStatuses":[]}}},"assignedUser":"Zxyz-1534826008383","statusReason":null}

此数据还具有针对同一字段的多个架构。

我也尝试过     locus_task = spark.read.format(“ com.mongodb.spark.sql.DefaultSource”)。option(“ database”,“ locus”)。option(“ collection”,“ tasks”)。load()

但有相同的错误:

  

scala.MatchError:SkipFieldType(类com.mongodb.spark.sql.types.SkipFieldType $)

我也尝试过

collection_schema = StructType([StructField('__v', IntegerType(), True),
                     StructField('_id', StringType(), True),
                     StructField('assignedUser', StringType(), True),
                     StructField('assignedUserLog', StringType(), True),
                     StructField('batch', StringType(), True),
                     StructField('city', StringType(), True),
                     StructField('createdAt', TimestampType(), True),
                     StructField('customer', StringType(), True),
                     StructField('feedbackToken', StringType(), True),
                     StructField('goodsType', StringType(), True),
                     StructField('id', StringType(), True),
                     StructField('isAdhoc', StringType(), True),
                     StructField('lat', StringType(), True),
                     StructField('lineItems', StringType(), True),
                     StructField('lng', StringType(), True),
                     StructField('orderId', StringType(), True),
                     StructField('reassigned', StringType(), True),
                     StructField('status', StringType(), True),
                     StructField('statusChangeLog', StringType(), True),
                     StructField('statusReason', StringType(), True),
                     StructField('taskDateTime', TimestampType(), True),
                     StructField('taskType', StringType(), True),
                     StructField('team', StringType(), True),
                     StructField('triggerTime', TimestampType(), True),
                     StructField('updatedAt', TimestampType(), True),
                     StructField('visits', StringType(), True),
                     StructField('volume', StringType(), True),
                     StructField('warehouse', StringType(), True)
                    ])

    locus_task = spark.read.format("com.mongodb.spark.sql.DefaultSource").option("database", "locus").option("collection", "tasks").load(schema=collection_schema)

这有效,但是pyspark函数(例如from_json,explode等)不起作用并给出此错误:

  

AnalysisException:“无法从批号24中提取值:需要结构类型但有字符串;”

那么我们如何提取嵌套部分呢?

我必须通过此原始数据创建多个sql表。请帮助我解决此问题。

0 个答案:

没有答案