我有一个看起来像这样的文本文件
,A,B
0,"[[-81.03443909 29.22855949]
[-81.09729767 29.27094078]
[-80.9937973 29.19698906]
[-81.03072357 29.27445984]
[-81.00499725 29.22187805]]","[[-81.42427063 28.30874634]
[-81.42427063 28.30874634]
[-81.42427063 28.30874634]
[-81.36068726 28.29172897]
[-81.42297363 28.30497551]
[-81.48571777 28.24975777]
[-81.35914612 28.29036331]]"
这是我使用的data
在放入Pandas DataFrame之后的样子
[[-78.70117188 33.80754852]
[-78.9934082 33.61843491]
[-80.81887817 28.60919952]
...,
[-76.62332916 35.54064941]
[-79.04235077 33.81600952]
[-79.03309631 33.55596161]]
我希望它看起来像这样
lat long
cluster point
0 a 0.445900 -1.286198
b -0.574496 -0.407154
c 0.872979 0.068084
d 0.297255 -2.157051
在创建.txt文件之前,数据位于nd.array
中,我正在使用pandas来创建文本文件。所以也许有一种方法可以跳过txt文件并使用pandas将数组拆分或格式化为一个整洁的数据帧。我已经在这一段时间了,我无法弄清楚如何。
这是我生成数据的方式。我通过仅复制2列来保持清晰,但将来我想传递一个唯一的点标识符
# Generate sample data
col_1 ="RL15_LONGITUDE"
col_2 ="RL15_LATITUDE"
data = pd.read_csv("input_data.csv")
coords = data.as_matrix(columns=[col_1, col_2])
data = data[[col_1,col_2]].dropna()
data = data.as_matrix().astype('float16',copy=False)
这是print clusters
[array([[-81.03443909, 29.22855949],
[-81.09729767, 29.27094078],
[-81.42297363, 28.30497551],
[-81.48571777, 28.24975777],
[-81.35914612, 28.29036331]], dtype=float32), array([[-81.49134064, 27.58896065],
[-81.5194931 , 27.63422012],
[-81.5096283 , 27.55581093],
[-82.05444336, 26.93555069]], dtype=float32), array([[-82.18956757, 26.52433586],
[-82.18956757, 26.52433586],
[-82.18956757, 26.52433586],
[-82.19439697, 26.53297997]], dtype=float32)]
这就是我创建数据框并编写.txt
文件
clusters = pd.DataFrame({'A':[clusters]})
clusters.to_csv('output.txt')
答案 0 :(得分:1)
这是一个起点:
In [72]: (pd.concat([pd.DataFrame(c, columns=['lat','lon']).assign(cluster=i)
....: for i,c in enumerate(clusters)])
....: .reset_index()
....: .rename(columns={'index':'point'})
....: )
Out[72]:
point lat lon cluster
0 0 -81.034439 29.228559 0
1 1 -81.097298 29.270941 0
2 2 -81.422974 28.304976 0
3 3 -81.485718 28.249758 0
4 4 -81.359146 28.290363 0
5 0 -81.491341 27.588961 1
6 1 -81.519493 27.634220 1
7 2 -81.509628 27.555811 1
8 3 -82.054443 26.935551 1
9 0 -82.189568 26.524336 2
10 1 -82.189568 26.524336 2
11 2 -82.189568 26.524336 2
12 3 -82.194397 26.532980 2
或使用多索引:
In [73]: (pd.concat([pd.DataFrame(c, columns=['lat','lon']).assign(cluster=i)
....: for i,c in enumerate(clusters)])
....: .reset_index()
....: .rename(columns={'index':'point'})
....: .set_index(['cluster','point'])
....: )
Out[73]:
lat lon
cluster point
0 0 -81.034439 29.228559
1 -81.097298 29.270941
2 -81.422974 28.304976
3 -81.485718 28.249758
4 -81.359146 28.290363
1 0 -81.491341 27.588961
1 -81.519493 27.634220
2 -81.509628 27.555811
3 -82.054443 26.935551
2 0 -82.189568 26.524336
1 -82.189568 26.524336
2 -82.189568 26.524336
3 -82.194397 26.532980