users_old = pd.read_sql('SELECT {} FROM ResUsers WHERE PERMISSION=1 \
AND DateAdded<"2013-01-01"'.format(RU_fields), db)
users_2013 = pd.read_sql('SELECT {} FROM ResUsers WHERE PERMISSION=1 \
DateAdded>"2013-01-01" AND DateAdded<"2014-01-01"'.format(RU_fields), db)
users_2014 = pd.read_sql('SELECT {} FROM ResUsers WHERE PERMISSION=1 \
DateAdded>"2014-01-01"'.format(RU_fields), db)
当我在ipython中运行这三个查询时,该过程最终使用大约16.5GB的内存。但是,当我运行此查询时:
users = pd.read_sql('SELECT {} FROM ResUsers WHERE PERMISSION=1'.format(RU_fields), db)
ipython进程使用越来越多的内存,直到崩溃为止。我使用的机器总共有60GB RAM。
现在,Permission和DateAdded都是非空的,所以我不知道这里会发生什么。为了进行健全性检查,我试过了:
mysql> SELECT count(*) FROM ResUsers WHERE Permission=1;
+----------+
| count(*) |
+----------+
| 31577307 |
+----------+
1 row in set (8.39 sec)
mysql> SELECT count(*) FROM ResUsers WHERE Permission=1 AND DateAdded<"2013-01-01"
-> ;
+----------+
| count(*) |
+----------+
| 8255583 |
+----------+
1 row in set (51.13 sec)
mysql> SELECT count(*) FROM ResUsers WHERE Permission=1 AND DateAdded>"2013-01-01" AND DateAdded<"2014-01-01";
+----------+
| count(*) |
+----------+
| 11966819 |
+----------+
1 row in set (55.76 sec)
mysql> SELECT count(*) FROM ResUsers WHERE Permission=1 AND DateAdded>"2014-01-01";
| count(*) |
+----------+
| 11354972 |
+----------+
1 row in set (51.11 sec)
其中并没有真正恢复原状,如8255583 + 11966819 + 11354972 = 31577374!= 31577307,虽然它非常接近......是否有一个原因在于mysql的数量可能会减少很少?
发生了什么,或者至少,怎么可以调试这个?如果可能有某种方法可以弄清楚内存中发生了什么,因为这个调用正在发生,我可以弄明白吗?
任何想法都赞赏!