Question

我有一个有趣的问题。我有两个文件，NYPD_Motor_Collisions.csv有1.2M行，weatherfinal.txt有109K行。目标是将来自weatherfinal.txt的temp和prec数据合并为Collisions文件，作为基于纬度和经度的两列。我在pandas python中使用dataframe编写了以下代码。

// input point struct
struct point { double x, y; };

// pass in output points by reference
void calculate_other_points(
   const point& x1, const point& x2, // input points x1 x2
   double w,                         // input width
   point& x3, point& x4)             // output points x3 x4
{
   // span vector x1 -> x2
   double dx = x2.x - x1.x,
          dy = x2.y - x1.y;
   // height
   double h = hypot(dx, dy);

   // perpendicular edge x1 -> x4 or x2 -> x3
   double px =  dy * (w / h),
          py = -dx * (w / h);

   // add onto x1 / x2 to obtain x3 / x4
   x4.x = x1.x + px; x4.y = x1.y + py;
   x3.x = x2.x + px; x3.y = x2.y + py;
}

这个程序已运行了好几天。不确定为什么数据帧太慢。我在没有使用字典的数据框架的情况下重写了程序，并在几分钟内完成。不确定数据帧是否很慢或我没有正确使用它。只是发帖在这里学习。

python pandas数据帧太慢了吗？

0 个答案: