Python 2.7
我有一个包含两列的数据框,coordinates
和loc
。 coordinates
包含10个纬度/经度对,而loc包含10个字符串。
以下代码导致ValueError,数组的长度不同。好像我写的条件不正确。
lst_10_cords = [['37.09024, -95.712891'], ['-37.605, 145.146'], ['43.0481962, -76.0488458'], ['29.7604267, -95.3698028'], ['47.6062095, -122.3320708'], ['34.0232431, -84.3615555'], ['31.9685988, -99.9018131'], ['37.226582, -95.70522299999999'], ['40.289918, -83.036372'], ['37.226582, -95.70522299999999']]
lst_10_locs = [['United States'], ['Doreen, Melbourne'], ['Upstate NY'], ['Houston, TX'], ['Seattle, WA'], ['Roswell, GA'], ['Texas'], ['null'], ['??, passing by...'], ['null']]
df = pd.DataFrame(columns=['coordinates', 'locs'])
df['coordinates'] = lst_10_cords
df['locs'] = lst_10_locs
print df
df = df[df['coordinates'] != ['37.226582', '-95.70522299999999']] #ValueError
错误消息是
文件“ C:\ Users ... \ Miniconda3 \ envs \ py2.7 \ lib \ site-packages \ pandas \ core \ ops.py”,林 e 1283,包装纸 res = na_op(值,其他) 文件“ C:\ Users ... \ Miniconda3 \ envs \ py2.7 \ lib \ site-packages \ pandas \ core \ ops.py”,林 e 1143,在na_op中 结果= _comp_method_OBJECT_ARRAY(op,x,y) 文件“ C:... \ biney \ Miniconda3 \ envs \ py2.7 \ lib \ site-packages \ pandas \ core \ ops.py”,林 e 1120,在_comp_method_OBJECT_ARRAY中 结果= libops.vec_compare(x,y,op) 在pandas._libs.ops.vec_compare中的文件“ pandas / _libs / ops.pyx”,第128行 ValueError:数组的长度不同:10 vs 2
我的目标是实际检查并消除坐标列中与列表[37.226582, -95.70522299999999]
相等的所有条目,因此我希望df['coordinates']
打印出[['37.09024, -95.712891'], ['-37.605, 145.146'], ['43.0481962, -76.0488458'], ['29.7604267, -95.3698028'], ['47.6062095, -122.3320708'], ['34.0232431, -84.3615555'], ['31.9685988, -99.9018131'], ['37.226582, -95.70522299999999'], ['40.289918, -83.036372']
我希望本文档对您有所帮助,尤其是显示以下内容的部分:
“您可以使用布尔向量从DataFrame中选择行,该布尔向量的长度与DataFrame的索引相同(例如,从DataFrame的列之一派生的值):”
df[df['A'] > 0]
所以看来我不太正确的语法...但是我被卡住了。我该如何为特定列的单元格值设置条件,并返回仅包含具有满足该条件的单元格的行的数据框?
答案 0 :(得分:2)
您可以考虑吗?:
df
coordinates locs
0 [37.09024, -95.712891] [United States]
1 [-37.605, 145.146] [Doreen, Melbourne]
2 [43.0481962, -76.0488458] [Upstate NY]
3 [29.7604267, -95.3698028] [Houston, TX]
4 [47.6062095, -122.3320708] [Seattle, WA]
5 [34.0232431, -84.3615555] [Roswell, GA]
6 [31.9685988, -99.9018131] [Texas]
7 [37.226582, -95.705222999] [null]
8 [40.289918, -83.036372] [??, passing by...]
9 [37.226582, -95.7052229999] [null]
df['lat'] = df['coordinates'].map(lambda x: np.float(x[0].split(",")[0]))
df['lon'] = df['coordinates'].map(lambda x: np.float(x[0].split(",")[1]))
df[~((np.isclose(df['lat'],37.226582)) & (np.isclose(df['lon'],-95.70522299999999)))]
coordinates locs lat lon
0 [37.09024, -95.712891] [United States] 37.090240 -95.712891
1 [-37.605, 145.146] [Doreen, Melbourne] -37.605000 145.146000
2 [43.0481962, -76.0488458] [Upstate NY] 43.048196 -76.048846
3 [29.7604267, -95.3698028] [Houston, TX] 29.760427 -95.369803
4 [47.6062095, -122.3320708] [Seattle, WA] 47.606209 -122.332071
5 [34.0232431, -84.3615555] [Roswell, GA] 34.023243 -84.361555
6 [31.9685988, -99.9018131] [Texas] 31.968599 -99.901813
8 [40.289918, -83.036372] [??, passing by...] 40.289918 -83.036372
答案 1 :(得分:0)
如果您查看数据框中的对象,这是一个问题,因为您看到的是单个字符串。您得到的错误的问题似乎是它正在将10元素系列.coordinates与2元素列表进行比较,并且显然存在不匹配的情况。使用.values似乎可以解决这个问题。
df2 = pd.DataFrame([如果row [0]行!= ['37 .226582,-95.70522299999999']否则[np.nan,np.nan]表示df.values中的行],columns = ['coords' ,'locs'])。dropna()
答案 2 :(得分:0)
好的,这是一种确保您可以使用干净数据的方法。
让我们假设4个条目的坐标坐标很脏。
sap.ui.table.Table
现在我们做一个清洁方法。您真的想使用以下方法测试这些值:
lst_4_cords = [['37.09024, -95.712891'], ['-37.605, 145.146'], ['43.0481962, -76.0488458'], ['null']]
lst_4_locs = [['United States'], ['Doreen, Melbourne'], ['Upstate NY'], ['Houston, TX']]
df = pd.DataFrame(columns=['coordinates', 'locs'])
df['coordinates'] = lst_4_cords
df['locs'] = lst_4_locs
coordinates locs
0 [37.09024, -95.712891] [United States]
1 [-37.605, 145.146] [Doreen, Melbourne]
2 [43.0481962, -76.0488458] [Upstate NY]
3 [null] [Houston, TX]
但是,我们将通过尝试以肮脏的方式进行操作。
type(value) is list.
type(value[0]) is string.
value[0].split(",") has two elements
each element can cast to float - etc.
Each is valid to be a lat or a lon
因此,返回值通常是具有2个浮点数的元组。如果无法变为默认值,则返回默认值(0.,0。)。
现在更新坐标
def scrubber_drainer(value):
try:
# we assume value is a list, with a single string in position zero, that string has a comma, that we can split into a tuple of two floats
return tuple([float(value[0].split(",")[0]),float(value[0].split(",")[1])])
except:
# return tuple (38.9072,77.0396) # swamp
return tuple([0.0,0.0]) # some default
然后我们使用这个很酷的technique来拆分元组
df['coordinates'] = df['coordinates'].map(scrubber_drainer)
现在您可以使用np.isclose()进行过滤