Question

'pc'具有庞大的Pandas数据框：

pc.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1754851 entries, 0 to 1754850
Data columns (total 33 columns):
 #   Column                          Dtype  
---  ------                          -----  
 0   Latitude                        float64
 1   Longitude                       float64
 2   Easting                         Int64  
 3   Northing                        Int64  
 4   Grid Ref                        string 
 5   County                          string 
 6   District                        string 
 7   Ward                            string 
 8   Country                         string 
 9   Constituency                    string 
 10  Parish                          string 
 11  National Park                   string 
 12  Population                      Int64  
 13  Households                      Int64  
 14  Built up area                   string 
 15  Built up sub-division           string 
 16  Lower layer super output area   string 
 17  Rural/urban                     string 
 18  Region                          string 
 19  Altitude                        Int64  
 20  London zone                     string 
 21  Local authority                 string 
 22  Middle layer super output area  string 
 23  Index of Multiple Deprivation   string 
 24  Quality                         Int64  
 25  User Type                       Int64  
 26  Last updated                    string 
 27  Nearest station                 string 
 28  Distance to station             float64
 29  Police force                    string 
 30  Water company                   string 
 31  Plus Code                       string 
 32  Average Income                  Int64  
dtypes: Int64(8), float64(3), string(22)
memory usage: 455.2 MB

有一个名为'Latitude'的列和另一个名为'Longitude'的列，我试图这样形成一个Geopandas地理数据框：

gdfpc = geopandas.GeoDataFrame(pc, geometry=geopandas.points_from_xy(pc.Longitude, df.Latitude))

这导致了以下错误：

ValueError: x and y arrays must be equal length.

呼叫pc.head()和pc.tail()无济于事：

pc.head()
    Latitude  Longitude  Easting  ...   Water company    Plus Code Average Income
0  57.149606  -2.096916   394235  ...  Scottish Water  9C9V4WX3+R6           <NA>
1  57.148707  -2.097806   394181  ...  Scottish Water  9C9V4WX2+FV           <NA>
2  57.149051  -2.097004   394230  ...  Scottish Water  9C9V4WX3+J5           <NA>
3  57.148080  -2.094664   394371  ...  Scottish Water  9C9V4WX4+64           <NA>
4  57.150058  -2.095916   394296  ...  Scottish Water  9C9V5W23+2J           <NA>
[5 rows x 33 columns]
pc.tail()
          Latitude  Longitude  ...    Plus Code  Average Income
1754846  59.889544  -1.307206  ...  9CFWVMQV+R4            <NA>
1754847  59.873651  -1.305697  ...  9CFWVMFV+FP            <NA>
1754848  59.875286  -1.307502  ...  9CFWVMGR+4X            <NA>
1754849  59.891572  -1.313847  ...  9CFWVMRP+JF            <NA>
1754850  59.892392  -1.310899  ...  9CFWVMRQ+XJ            <NA>
[5 rows x 33 columns]

寻找最大和最小的经度和纬度并没有发现可能提供线索的缺失值：

pc.nlargest(1, columns='Latitude')
          Latitude  Longitude  ...    Plus Code  Average Income
1754598  60.800694  -0.869518  ...  9CGXR42J+75            <NA>
[1 rows x 33 columns]
pc.nlargest(1, columns='Longitude')
         Latitude   Longitude  Easting  ...  Water company Plus Code Average Income
111540   4.610106  114.331172     <NA>  ...           <NA>      <NA>           <NA>
[1 rows x 33 columns]
pc.nsmallest(1, columns='Latitude')
         Latitude  Longitude  Easting  ...  Water company Plus Code Average Income
111552 -51.796253 -59.523613     <NA>  ...           <NA>      <NA>           <NA>
[1 rows x 33 columns]
pc.nsmallest(1, columns='Longitude')
         Latitude   Longitude  Easting  ...  Water company Plus Code Average Income
111544  34.924031 -117.891208     <NA>  ...           <NA>      <NA>           <NA>
[1 rows x 33 columns]

将相应的列转换为单独的Pandas系列，然后转换为numpy数组以进行进一步分析仍然无法发现任何可识别的差异：

>>>La = pc['Latitude']
>>>Lo = pc['Longitude']
>>>npLa=La.to_numpy(copy=True)
>>>npLo=Lo.to_numpy(copy=True)
>>>np.asarray(npLo).shape
(1754851,)
>>>np.asarray(npLa).shape
(1754851,)
>>>npLa.size
1754851
>>>npLo.size
1754851

在我辞职去各地使用Haversine公式之前有任何想法吗？

将熊猫数据框转换为地理数据框时出错-：x和y数组的长度必须相等

0 个答案: