如何在panda数据帧中将坐标数组添加为行

时间:2016-09-15 18:36:49

标签: python pandas numpy dataframe

我有一个看起来像这样的文本文件

,A,B
0,"[[-81.03443909  29.22855949]
 [-81.09729767  29.27094078]
 [-80.9937973   29.19698906]
 [-81.03072357  29.27445984]
 [-81.00499725  29.22187805]]","[[-81.42427063  28.30874634]
 [-81.42427063  28.30874634]
 [-81.42427063  28.30874634]
 [-81.36068726  28.29172897]
 [-81.42297363  28.30497551]
 [-81.48571777  28.24975777]
 [-81.35914612  28.29036331]]"

这是我使用的data在放入Pandas DataFrame之后的样子

[[-78.70117188  33.80754852]
 [-78.9934082   33.61843491]
 [-80.81887817  28.60919952]
 ..., 
 [-76.62332916  35.54064941]
 [-79.04235077  33.81600952]
 [-79.03309631  33.55596161]]

我希望它看起来像这样

                       lat      long 
cluster      point                              
0          a      0.445900 -1.286198 
           b     -0.574496 -0.407154 
           c      0.872979  0.068084 
           d      0.297255 -2.157051 

在创建.txt文件之前,数据位于nd.array中,我正在使用pandas来创建文本文件。所以也许有一种方法可以跳过txt文件并使用pandas将数组拆分或格式化为一个整洁的数据帧。我已经在这一段时间了,我无法弄清楚如何。

这是我生成数据的方式。我通过仅复制2列来保持清晰,但将来我想传递一个唯一的点标识符

# Generate sample data
col_1 ="RL15_LONGITUDE"
col_2 ="RL15_LATITUDE"

data = pd.read_csv("input_data.csv")
coords = data.as_matrix(columns=[col_1, col_2])
data = data[[col_1,col_2]].dropna()
data = data.as_matrix().astype('float16',copy=False)

这是print clusters

的输出
[array([[-81.03443909,  29.22855949],
       [-81.09729767,  29.27094078],
       [-81.42297363,  28.30497551],
       [-81.48571777,  28.24975777],
       [-81.35914612,  28.29036331]], dtype=float32), array([[-81.49134064,  27.58896065],
       [-81.5194931 ,  27.63422012],
       [-81.5096283 ,  27.55581093],
       [-82.05444336,  26.93555069]], dtype=float32), array([[-82.18956757,  26.52433586],
       [-82.18956757,  26.52433586],
       [-82.18956757,  26.52433586],
       [-82.19439697,  26.53297997]], dtype=float32)]

这就是我创建数据框并编写.txt文件

的方法
clusters = pd.DataFrame({'A':[clusters]})
clusters.to_csv('output.txt')

1 个答案:

答案 0 :(得分:1)

这是一个起点:

In [72]: (pd.concat([pd.DataFrame(c, columns=['lat','lon']).assign(cluster=i)
   ....:             for i,c in enumerate(clusters)])
   ....:    .reset_index()
   ....:    .rename(columns={'index':'point'})
   ....: )
Out[72]:
    point        lat        lon  cluster
0       0 -81.034439  29.228559        0
1       1 -81.097298  29.270941        0
2       2 -81.422974  28.304976        0
3       3 -81.485718  28.249758        0
4       4 -81.359146  28.290363        0
5       0 -81.491341  27.588961        1
6       1 -81.519493  27.634220        1
7       2 -81.509628  27.555811        1
8       3 -82.054443  26.935551        1
9       0 -82.189568  26.524336        2
10      1 -82.189568  26.524336        2
11      2 -82.189568  26.524336        2
12      3 -82.194397  26.532980        2

或使用多索引:

In [73]: (pd.concat([pd.DataFrame(c, columns=['lat','lon']).assign(cluster=i)
   ....:             for i,c in enumerate(clusters)])
   ....:    .reset_index()
   ....:    .rename(columns={'index':'point'})
   ....:    .set_index(['cluster','point'])
   ....: )
Out[73]:
                     lat        lon
cluster point
0       0     -81.034439  29.228559
        1     -81.097298  29.270941
        2     -81.422974  28.304976
        3     -81.485718  28.249758
        4     -81.359146  28.290363
1       0     -81.491341  27.588961
        1     -81.519493  27.634220
        2     -81.509628  27.555811
        3     -82.054443  26.935551
2       0     -82.189568  26.524336
        1     -82.189568  26.524336
        2     -82.189568  26.524336
        3     -82.194397  26.532980