我有一个DataFrame,其列具有不同的坐标,在其他列表中聚集在一起,如下所示:
name OBJECTID geometry
0 NaN 1 ['-80.304852,-3.489302,0.0','-80.303087,-3.490214,0.0',...]
1 NaN 2 ['-80.27494,-3.496571,0.0',...]
2 NaN 3 ['-80.267987,-3.500003,0.0',...]
我想分隔值并删除' 0.0',但将它们保留在列表中以将它们添加到字典中的某个键,如下所示:
name OBJECTID geometry
0 NaN 1 [[-80.304852, -3.489302],[-80.303087, -3.490214],...]
1 NaN 2 [[-80.27494, -3.496571],...]
2 NaN 3 [[-80.267987, -3.500003],...]
这是我的代码,在我试图在for循环中将它们分开的地方不起作用:
import panda as pd
import numpy as np
r = pd.read_csv('data.csv')
rloc = np.asarray(r['geometry'])
r['latitude'] = np.zeros(r.shape[0],dtype= r['geometry'].dtype)
r['longitude'] = np.zeros(r.shape[0],dtype= r['geometry'].dtype)
# Separating the latitude and longitude values form each string.
for i in range(0, len(rloc)):
for j in range(0, len(rloc[i])):
coord = rloc[i][j].split(',')
r['longitude'] = coord[0]
r['latitude'] = coord[1]
r = r[['OBJECTID', 'latitude', 'longitude', 'name']]
编辑:结果并不好,因为它只为每一个打印出一个值。
OBJECTID latitude longitude name
0 1 -3.465566 -80.151633 NaN
1 2 -3.465566 -80.151633 NaN
2 3 -3.465566 -80.151633 NaN
奖金问题:我如何在元组中添加所有这些经度和纬度值以与geopy一起使用?像这样:
r['location'] = (r['latitude], r['longitude'])
因此,几何列将如下所示:
geometry
[(-80.304852, -3.489302),(-80.303087, -3.490214),...]
[(-80.27494, -3.496571),...]
[(-80.267987, -3.500003),...]
编辑:
数据最初看起来像这样(每行):
<LineString><coordinates>-80.304852,-3.489302,0.0 -80.303087,-3.490214,0.0 ...</coordinates></LineString>
我使用此代码修改了正则表达式:
geo = np.asarray(r['geometry']);
geo = [re.sub(re.compile('<.*?>'), '', string) for string in geo]
然后我把它放在一个数组中:
rv = [geo[i].split() for i in range(0,len(geo))]
r['geometry'] = np.asarray(rv)
当我调用r [&#39; geometry&#39;]时,输出为:
0 [-80.304852,-3.489302,0.0, -80.303087,-3.49021...
1 [-80.27494,-3.496571,0.0, -80.271963,-3.49266,...
2 [-80.267987,-3.500003,0.0, -80.267845,-3.49789...
Name: geometry, dtype: object
r['geometry'][0]
是:
['-80.304852,-3.489302,0.0',
'-80.303087,-3.490214,0.0',
'-80.302131,-3.491878,0.0',
'-80.300763,-3.49213,0.0']
答案 0 :(得分:2)
带有玩具数据集输入的pandas解决方案:
df = pd.read_csv("test.txt")
name OBJECTID geometry
0 NaN 1 ['-80.3,-3.4,0.0','-80.3,-3.9,0.0','-80.3,-3.9...
1 NaN 2 ['80.2,-4.4,0.0','-81.3,2.9,0.0','-80.7,-3.2,0...
2 NaN 3 ['-80.1,-3.2,0.0','-80.8,-2.9,0.0','-80.1,-1.9...
现在转换为经度 - 纬度对的列:
#regex extraction of longitude latitude pairs
pairs = "(-?\d+.\d+,-?\d+.\d+)"
s = df["geometry"].str.extractall(pairs)
#splitting string into two parts, creating two columns for longitude latitude
s = s[0].str.split(",", expand = True)
#converting strings into float numbers - is this even necessary?
s[[0, 1]] = s[[0, 1]].apply(pd.to_numeric)
#creating a tuple from longitude/latitude columns
s["lat_long"] = list(zip(s[0], s[1]))
#placing the tuples as columns in original dataframe
df = pd.concat([df, s["lat_long"].unstack(level = -1)], axis = 1)
玩具数据集的输出:
name OBJECTID geometry \
0 NaN 1 ['-80.3,-3.4,0.0','-80.3,-3.9,0.0','-80.3,-3.9...
1 NaN 2 ['80.2,-4.4,0.0','-81.3,2.9,0.0','-80.7,-3.2,0...
2 NaN 3 ['-80.1,-3.2,0.0','-80.8,-2.9,0.0','-80.1,-1.9...
0 1 2
0 (-80.3, -3.4) (-80.3, -3.9) (-80.3, -3.9)
1 (80.2, -4.4) (-81.3, 2.9) (-80.7, -3.2)
2 (-80.1, -3.2) (-80.8, -2.9) (-80.1, -1.9)
或者,您可以将一列中的元组组合为一个列表:
s["lat_long"] = list(zip(s[0], s[1]))
#placing the tuples as a list into a column of the original dataframe
df["lat_long"] = s.groupby(level=[0])["lat_long"].apply(list)
立即输出:
name OBJECTID geometry \
0 NaN 1 ['-80.3,-3.4,0.0','-80.3,-3.9,0.0','-80.3,-3.9...
1 NaN 2 ['80.2,-4.4,0.0','-81.3,2.9,0.0','-80.7,-3.2,0...
2 NaN 3 ['-80.1,-3.2,0.0','-80.8,-2.9,0.0','-80.1,-1.9...
lat_long
0 [(-80.3, -3.4), (-80.3, -3.9), (-80.3, -3.9)]
1 [(80.2, -4.4), (-81.3, 2.9), (-80.7, -3.2)]
2 [(-80.1, -3.2), (-80.8, -2.9), (-80.1, -1.9)]
答案 1 :(得分:1)
在您的代码中,您实际上是将最后一次迭代的经度和纬度值分配给完整列。您也可以将字符串转换为float:
if let jsSourcePath = Bundle.main.path(forResource: "scripted", ofType: "js") {
do {
let jsSourceContents = try String(contentsOfFile: jsSourcePath)
self.webby.evaluateJavaScript(jsSourceContents)
}
catch {
print(error.localizedDescription)
}
}
寻求奖金:)
# Separating the latitude and longitude values form each string.
for i in range(0, len(rloc)):
r['longitude'][i] = []
r['latitude'][i] = []
for j in range(0, len(rloc[i])):
coord = rloc[i][j].split(',')
r['longitude'][i].append(float(coord[0]))
r['latitude'][i].append(float(coord[1]))