Question

我正在使用R和Python，并且我想将其中一个熊猫DataFrame作为羽毛编写，以便可以在R中更轻松地使用它。但是，当我尝试将其编写为羽毛时，我得到了以下错误：

ArrowInvalid: trying to convert NumPy type float64 but got float32

我仔细检查了我的列类型，它们已经是浮点数64：

In[1]
df.dtypes

Out[1]
id         Object
cluster    int64
vector_x   float64
vector_y   float64

无论使用feather.write_dataframe(df, "path/df.feather")还是df.to_feather("path/df.feather")，我都会遇到相同的错误。

我在GitHub上看到了这个，但不知道它是否相关：https://issues.apache.org/jira/browse/ARROW-1345和https://github.com/apache/arrow/issues/1430

最后，我可以将其另存为csv并更改R中的列（或仅使用Python进行整个分析），但是我希望使用它。

编辑1：

尽管下面有很好的建议，但仍然存在相同的问题，因此请更新我的尝试。

df[['vector_x', 'vector_y', 'cluster']] = df[['vector_x', 'vector_y', 'cluster']].astype(float)

df[['doc_id', 'text']] = df[['doc_id', 'text']].astype(str)

df[['doc_vector', 'doc_vectors_2d']] = df[['doc_vector', 'doc_vectors_2d']].astype(list)

df.dtypes

Out[1]:
doc_id           object
text             object
doc_vector       object
cluster          float64
doc_vectors_2d   object
vector_x         float64
vector_y         float64
dtype: object

编辑2：

经过大量搜索，看来问题是我的集群列是由int64整数组成的列表类型。所以我想真正的任务是，羽毛格式支持列表吗？

Answer 1

经过大量研究，简单的答案是羽毛不支持列表（或其他嵌套数据类型）列。

Answer 2

您遇到的问题是id Object列。这些是Python对象，无法以语言中立的格式表示。这根羽毛（实际上是基础的Apache Arrow / pyarrow）正在尝试猜测id列的DataType。对它在该列中看到的第一个对象进行猜测。这些是float64 numpy标量。后来，您有了float32标量。除了强制将它们强制为某种类型之外，Arrow对类型的要求更为严格，并且会失败。

通过确保所有列都具有带有df['id'] = df['id'].astype(float)的非对象dtype，您应该能够解决此问题。

Answer 3

幸运的是，我在这里遇到了羽毛IO错误的原因。

我也为这个问题找到了解决方案，pandas.to_feather和read_feather都基于pyarrow，并且从2019年开始，pyarrow已经支持包含值列表的列。

解决方案：

pip install pyarrow==latest # my version is 1.0.0 and it work

然后，仍然使用pd.to_feather（“ Filename”）和read_feather。

尝试将DataFrame写入Feather时出错。羽毛支持列表列吗？

3 个答案: