Question

我是Python新手。我需要处理一个看起来太大的数据库（因此会破坏内存）。以下是详细信息：

<class 'pandas.core.frame.DataFrame'>
Int64Index: 4747156 entries, 0 to 4747155
Data columns (total 5 columns):
User             int64
Date_and_time    object
Latitude         float64
Longitude        float64
Location_id      object
dtypes: float64(2), int64(1), object(2)
memory usage: 217.3+ MB

如何减小尺寸以便能够使用它？特别是，我需要摆脱一些无关紧要的界限。

我的数据库如下所示：

locations_df.head()

User    Date_and_time   Latitude    Longitude   Location_id
0   2010-10-17T01:48:53Z    39.747652   -104.992510 88c46bf20  
0   2010-10-16T06:02:04Z    39.891383   -105.070814 7a0f8898

有些线条是不相关的，因为纬度和经度等于0.0，我需要摆脱这些（它们有很多），因为它没用了

非常感谢你的帮助！

Answer 1

您应该创建一个新的数据框架，消除不需要的值：

df[df['Latitude'].isin([0])]

并将字段类型更改为较小的类型：

df[['two', 'three']].astype(float)

可用类型列表： https://docs.scipy.org/doc/numpy-1.10.1/user/basics.types.html

Answer 2

假设您正在使用pandas数据框df，您可以使用以下方法过滤数据框中不需要的列值：

df = df[(df['Latitude'] > 0) | (df['Longitude'] > 0)]

假设您希望在Latitude和Longitude都等于0时删除每一行。

Python - 减少数据库的大小

2 个答案: