Question

这是我的数据的样子，

        user_id article_id  send_time author_id topic_id type_id
    0   11460         66 2015-01-02     18587       72      22
    1    5475         66 2015-01-02     18587       72      22
    2    1205         66 2015-01-02     18587       72      22
    3   17040         66 2015-01-02     18587       72      22
    4   18940         66 2015-01-02     18587       72      22

我尝试使用此代码，一次使用格式选项一次。如果没有该选项，我会收到错误

代码

 np.savetxt(r'C:/Users/AmitSingh/Desktop/Data/data_scientist_test/access_log/new_dataframe.txt',new_dataframe.values)

错误

 TypeError: Mismatch between array dtype ('object') and format specifier ('%.18e %.18e %.18e %.18e %.18e %.18e')

使用格式选项

代码

np.savetxt(r'C:/Users/AmitSingh/Desktop/Data/data_scientist_test/access_log/new_dataframe.txt',new_dataframe.values,fmt='%d')

错误

TypeError: Mismatch between array dtype ('object') and format specifier ('%d %d %d %d %d %d')

我还能做什么？我需要在txt文件中写这个，因为在csv / excel文件中写的行太多了

Answer 1

由于DataFrame中的混合类型（包括object引用），您收到该错误。

最简单的解决方案是使用pandas.DataFrame.to_csv方法而不是numpy.savetxt：

new_dataframe.to_csv(r'C:/Users/AmitSingh/Desktop/Data/data_scientist_test/access_log/new_dataframe.txt')

计算按日期分组的值

1 个答案: