Question

我有2个dask数据帧。 1. df = 100行 2. q2d = 500000行

两者都有一个称为uuid的公共列，我正试图在dask中合并两个数据帧。

它非常简单的任务-

    case = dd.merge(q2d, df, left_on='UUID',right_on='uuid', how='left')

要通过此过程从df到q2d添加几列，并在该过程中向500K记录添加更多列。但这会因错误而失败：

   ValueError: Mismatched dtypes found in `pd.read_csv`/`pd.read_table`.
   | Column | Found   | Expected |
   +--------+---------+----------+
   | 641860 | float64 | int64    |
   +--------+---------+----------+

   Usually this is due to dask's dtype inference failing, and
   *may* be fixed by specifying dtypes manually by adding:

   dtype={'641860': 'float64'}

   to the call to `read_csv`/`read_table`.

   Alternatively, provide `assume_missing=True` to interpret
   all unspecified integer columns as floats.

我在df中没有名为641860的列名（起初没有标题并选择了第一行，但是我在标题行中添加了df.rename cols ....和确认它具有标题。为什么它显示旧名称

如何在不获取上述错误的情况下合并dask数据框？我尝试将col dtype更改为int64并进行了验证，当我执行df.head时它显示为int64

<bound method _Frame.head of Dask DataFrame Structure:
          uuid county_geoid cbsa_geoid state_geoid   rent
  npartitions=765                                                  
               int64        int64      int64       int64  int64

Answer 1

此错误发生在read_csv调用中，远远早于您运行的任何其他操作（如重命名或astype）都可以调用。要解决该错误，我建议在您的read_csv调用中包含错误消息所建议的代码。

在DType上合并python中的dask数据帧失败，如何更改？

1 个答案: