PyMongo:批量插入新集合时出现NatType ValueError

时间:2018-09-20 16:40:33

标签: python mongodb pymongo

我正在尝试使用PyMongo将日期和文本数据的混合集上载到远程MongoDB服务器中的新集合。

但是,由于将空值与日期混在一起而导致出现错误,即具有None值而不是datetime.datetime()对象的行。

在某些背景下:原始数据存储在CSV文件中,我正在使用pandas.DataFrame()读取到pandas.read_csv()中。将数据保存在pandas中之后,我将进行一些基本的清理,然后将数据转换为字典列表,然后使用标准的collection.insert_many()方法将其上传到集合中。

最初,每行/文档/字典中的值都存储为字符串。但是,在上传数据之前,我通过对每个值调用datetime将许多日期列转换为datetime.datetime.strptime()对象。但是,并非每个词典都填充了这些日期字段。对于这些词典,我只使用None而不是datetime对象。

然后,我尝试上传的结果数据是一个词典列表,其中混入了许多NoneType值,,当我调用insert_many()时,我得到了:

ValueError: NaTType does not support utcoffset.

我对utcoffset不熟悉,而我对此的尝试使我感到困惑。

有人遇到过这个问题,或者对如何在PyMongo中处理丢失的日期时间数据有建议?

这是我的代码:

import pandas as pd
import pymongo

source = '/path/to/data'
sampleData = pd.read_csv(source, dtype=str)

Date_Columns = [
    'date_a',
    'date_b',
    'date_c',
    'date_d'
]
cleanData = sampleData
for col in Date_Columns:

    # Convert the strings to datetime objects for each column.
    # If a value is null, then use a None object instead of a datetime.
    Strings = sampleData[col].values
    Formats = [dt.datetime.strptime(d, '%m/%d/%Y') if isinstance(d, str) else None for d in Strings]
    cleanData[col] = Formats

client = pymongo.MongoClient('XX.XX.XX.XX', 99999)
db = client['my_db']
c = db['my_collection']

# Convert the cleaned DataFrame into a list of dictionaries.
Keys = [key for key in sampleData.columns.values]
Data = [dict(zip(Keys, L)) for L in sampleData.values]

c.insert_many(Data)

完整回溯:

Traceback (most recent call last):
  File "/Users/haru/my_git/projects/pipeline/stable/sofla_permits_sunnyisles.py", line 738, in <module>
    setup_db()
  File "/Users/haru/my_git/projects/pipeline/stable/sofla_permits_sunnyisles.py", line 679, in setup_db
    c.insert_many(Data)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pymongo/collection.py", line 753, in insert_many
    blk.execute(write_concern, session=session)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pymongo/bulk.py", line 513, in execute
    return self.execute_command(generator, write_concern, session)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pymongo/bulk.py", line 338, in execute_command
    self.is_retryable, retryable_bulk, s, self)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pymongo/mongo_client.py", line 1196, in _retry_with_session
    return func(session, sock_info, retryable)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pymongo/bulk.py", line 333, in retryable_bulk
    retryable, full_result)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pymongo/bulk.py", line 285, in _execute_command
    self.collection.codec_options, bwc)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pymongo/message.py", line 1273, in _do_bulk_write_command
    namespace, operation, command, docs, check_keys, opts, ctx)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pymongo/message.py", line 1263, in _do_batched_write_command
    namespace, operation, command, docs, check_keys, opts, ctx)
  File "pandas/_libs/tslibs/nattype.pyx", line 59, in pandas._libs.tslibs.nattype._make_error_func.f
ValueError: NaTType does not support utcoffset

1 个答案:

答案 0 :(得分:1)

大多数计算机的时钟都设置为utc,这是理想的选择。这是给定日期(以70年代为准)到秒之间的整数值。这意味着您的流程计划不依赖于本地时间,包括令人头痛的“夏令时”。

UTC与美国东部标准的偏移量为4-5小时(取决于夏令时)。

看看您的错误,这是一个pandas错误,并且pandas.datetime不能和datetime.datetime很好地配合。将其转换为所需精度的日期时间string。那应该避免这个错误。