for循环期间在数据帧中填充的数据,但在循环之后不再存在

时间:2018-01-19 00:59:10

标签: python excel pandas

所以我的数据框大约有5列。其中2个是元组形式的经度和晶格对。所以我有另一个用户定义的函数来计算两个给定的lon / lat元组之间的距离。

data_all['gc_distance'] = ""

### let's start calculate the great circle distance
for idx, row in data_all.iterrows():
    row['gc_distance'] = gcd.dist(row['ping_location'], row['destination'])
    print(row)

所以基本上,我创建了一个名为gc_distance的空列,然后我遍历每一行来计算距离。当我打印每一行时,数据看起来很棒;

一行打印样本:

created_at_des                                     2018-01-17 18:55:55.154000
location_missing                                                            0
ping_location                                (-121.9419444444, 37.4897222222)
destination                                            (-122.15057, 37.39465)
gc_distance                                                          23.85 km
Name: 393529, dtype: object

如您所见,gc_distance DOES具有值。

这是循环后print语句的示例输出:

 location_missing              ping_location  \
                 0   (-152.859052, 51.218273)   
                 0    (120.585289, 31.298974)   
                 0    (120.585289, 31.298974)   
                 0    (120.585289, 31.298974)   
                 0  (121.4737021, 31.2303904)   

                    destination gc_distance  
    0  (-122.057005, 37.606922)              
    1  (-122.057005, 37.606922)              
    2  (-122.057005, 37.606922)              
    3  (-122.057005, 37.606922)              
    4  (-122.057005, 37.606922) 

然而,当我在for循环之外再次打印时,gc_distance列只有空白的值! :(

这是为什么???没有编译或运行时错误......所有其他输出看起来都很好,为什么这个计算字段不存在,即使我在for循环中打印它确实有价值? (但在外面换循环它不再了)

1 个答案:

答案 0 :(得分:1)

尝试使用此方法:

import pandas as pd
import numpy as np
import math

def dist(i):
    diff = list(map(lambda a,b: a-b, df['a'][i], df['b'][i]))
    squared = [(k)**2 for k in diff]
    squared_diff = sum(squared)
    root = math.sqrt(squared_diff)
    return root



df = pd.DataFrame([[0, 0, 5, 6, '', '', ''], [2, 6, -5, 8, '', '', '']], columns = ["x_a", "y_a", "x_b", "y_b", "a", "b", "dist"])
print(df)

#data_all['ping_location'] = list(zip(data_all.longitude_evnt, data_all.lattitude_evnt))

df['a'] = list(zip(df.x_a, df.y_a))     
df['b'] = list(zip(df.x_b, df.y_b)) 
print(df)

for i in range(0, len(df)):
    df['dist'][i] = dist(i)
    print(dist(i))

print(df)

这是我的终端输出:

   x_a  y_a  x_b  y_b a b dist
0    0    0    5    6         
1    2    6   -5    8         
   x_a  y_a  x_b  y_b       a        b dist
0    0    0    5    6  (0, 0)   (5, 6)     
1    2    6   -5    8  (2, 6)  (-5, 8)     
test.py:24: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  df['dist'][i] = dist(i)
7.810249675906654
7.280109889280518
   x_a  y_a  x_b  y_b       a        b     dist
0    0    0    5    6  (0, 0)   (5, 6)  7.81025
1    2    6   -5    8  (2, 6)  (-5, 8)  7.28011