Pandas - 使用itertuples创建列

时间:2017-06-16 16:21:22

标签: python list loops pandas itertools

我有pandas.DataFrame AcctIdLatitudeLongitude。我还有一个坐标列表。我试图计算纬度和经度与列表中每个坐标对之间的距离(使用半正弦公式)。然后我想返回最小距离,并在数据框中使用值创建一个新列。

但是,我的输出表只返回循环中最后一行的距离值。我已尝试使用itertuplesiterrows和正常循环,但这些方法都不适合我。

df
AcctId   Latitude   Longitude
123      40.50      -90.13
123      40.53      -90.21
123      40.56      -90.45
123      40.63      -91.34

coords = [41.45,-95.13,39.53,-100.42,45.53,-95.32]

for row in df.itertuples():
    Latitude = row[1]
    Longitude = row[2]
    distances = []
    lat = []
    lng = []
    for i in xrange(0, len(coords),2):
          distances.append(haversine_formula(Latitude,coords[i],Longitude,coords[i+1])
          lat.append(coords[i])
          lng.append(coords[i+1])
          min_distance = min(distances)
    df['Output'] = min_distance

期望输出:

df
AcctId   Latitude    Longitude    Output
123      40.50      -90.13         23.21
123      40.53      -90.21         38.42
123      40.56      -90.45         41.49
123      40.63      -91.34         42.45

实际输出:

df
AcctId   Latitude    Longitude    Output
123      40.50      -90.13         42.45
123      40.53      -90.21         42.45
123      40.56      -90.45         42.45
123      40.63      -91.34         42.45

最终代码

for row in df.itertuples():
    def min_distance(row):
        here = (row.Latitude, row.Longitude)
        return min(haversine(here, coord) for coord in coords)
    df['Nearest_Distance'] = df.apply(min_distance, axis=1)

1 个答案:

答案 0 :(得分:1)

您正在寻找pandas.DataFrame.apply()。类似的东西:

代码:

function scrollDown(num_times) {
  num_times -= 1;
  if (num_times === 0) {
    return;
  }
  window.scrollBy(0, 500); // horizontal and vertical scroll increments
  setTimeout(function() {
    scrollDown(num_times);
  }, 500);
}
//This should run first and scroll the screen before prompting
scrollDown(30); // scroll down 30 times


//However this prompt comes up before the above code has ran
var kw = prompt("Please enter your name");

测试代码:

df['output'] = df.apply(min_distance, axis=1)

结果:

df = pd.read_fwf(StringIO(u'''
        AcctId   Latitude   Longitude
        123      40.50      -90.13
        123      40.53      -90.21
        123      40.56      -90.45
        123      40.63      -91.34'''), header=1)

coords = [
    (41.45, -95.13),
    (39.53, -100.42),
    (45.53, -95.32)
]

from haversine import haversine

def min_distance(row):
    here = (row.Latitude, row.Longitude)
    return min(haversine(here, coord) for coord in coords)

df['output'] = df.apply(min_distance, axis=1)

print(df)