将值分配给dataframe列

时间:2016-08-21 00:08:25

标签: pandas dataframe

在下面的代码中,数据帧df5未填充。我只是将值分配给dataframe的列,并且我已事先指定了该列。当我打印数据帧时,它返回一个空数据帧。不确定我是否遗漏了什么。

任何帮助都将不胜感激。

import math    
import pandas as pd

columns = ['ClosestLat','ClosestLong']

df5 = pd.DataFrame(columns=columns)

def distance(pt1, pt2):
  return math.sqrt((pt1[0] - pt2[0])**2 + (pt1[1] - pt2[1])**2)

for pt1 in df1:
   closestPoints = [pt1, df2[0]]
   for pt2 in df2:
     if distance(pt1, pt2) < distance(closestPoints[0], closestPoints[1]):
       closestPoints = [pt1, pt2]
       df5['ClosestLat'] = closestPoints[1][0]
   df5['ClosestLat'] = closestPoints[1][0]
   df5['ClosestLong'] = closestPoints[1][1]
   print ("Point: " + str(closestPoints[0]) + " is closest to " + str(closestPoints[1]))

1 个答案:

答案 0 :(得分:2)

从代码的外观来看,您尝试使用纬度和经度列表填充df5。但是,你犯了一些错误。

  1. pandas数据帧的列是Series,并保存某种类型的顺序数据。因此df5['ClosestLat'] = closestPoints[1][0]会尝试为整列分配一个数字值,并产生一个空列。
  2. 即使数据框架没有忽略您为列分配实数的尝试,也会因为每次循环覆盖列而丢失数据。
  3. 解决方案:构建一个lats和long列表,然后插入到数据框中。

    import math    
    import pandas as pd
    
    columns = ['ClosestLat','ClosestLong']
    
    df5 = pd.DataFrame(columns=columns)
    
    def distance(pt1, pt2):
      return math.sqrt((pt1[0] - pt2[0])**2 + (pt1[1] - pt2[1])**2)
    
    lats, lngs = [], []
    for pt1 in df1:
       closestPoints = [pt1, df2[0]]
       for pt2 in df2:
         if distance(pt1, pt2) < distance(closestPoints[0], closestPoints[1]):
           closestPoints = [pt1, pt2]
       lats.append(closestPoints[1][0])
       lngs.append(closestPoints[1][1])
    
    df['ClosestLat'] = pd.Series(lats)
    df['ClosestLong'] = pd.Series(lngs)