找到并切掉数据数组中的重叠

时间:2017-10-31 17:08:47

标签: python arrays python-2.7 numpy slice

我想摆脱阵列末尾经度数据的重叠(0到20.4度)。所以最后我希望值为0-360。

我将为许多具有可变数值重叠的数组执行此操作,因此我不能只切掉最后三个值。此外,起点和终点不会始终为0& 360或20.4。我还想保留值的顺序,以便我可以切掉纬度数组中的相应值。

互联网上的大部分信息都是关于摆脱重复的值,但由于数字尾随小数而没有我的值。

lon = np.array([0.9783,20.1276,40.3784,60.0987,80.3748,100.9999,120.4567,140.3543,160.2342,180.3453,200.8874,220.2346,240.5554,260.5676,280.4345,300.4454,320.5654,340.6432,360.3343,0.0124,10.3213,20.4355]) 

我已尝试使用&lt ;,>,=,np.where或if / else进行头脑风暴的方法,但目前尚无成功。

感谢任何帮助或建议。

5 个答案:

答案 0 :(得分:1)

如果你想在data再次开始之后摆脱所有元素(所以在你的情况下只有elements直到 360.3343之前的0.0124,以下for-loop应该可以胜任。

stop = False
for i in range(len(lon)-1):
    if stop and lon[i] > lon[0]:
        lon = lon[:i]
        break
    if lon[i] > lon[i+1]:
        stop = True

您在问题中为data提供的lon

lon = np.array([0.9783,20.1276,40.3784,60.0987,80.3748,100.9999,120.4567,140.3543,160.2342,180.3453,200.8874,220.2346,240.5554,260.5676,280.4345,300.4454,320.5654,340.6432,360.3343,0.0124,10.3213,20.4355]):

lon修改为:

array([   0.9783,   20.1276,   40.3784,   60.0987,   80.3748,  100.9999, 120.4567,  140.3543,  160.2342,  180.3453,  200.8874,  220.2346, 240.5554,  260.5676,  280.4345,  300.4454,  320.5654,  340.6432, 360.3343])

使用以下方式演示此更新的解决方案

lon = np.array([50 ,110, 200, 340, 1, 10, 25, 80, 90, 130]) 

我们得到:

array([ 50, 110, 200, 340,   1,  10,  25])

希望这最终能满足您的需求!

答案 1 :(得分:1)

@Joe Iddon的答案会有效,但如果你想避免循环,你可以这样做:

diff = np.diff(lon)
drops = np.flatnonzero(diff < 0)
if len(drops) > 0:
    # Only do this if there is a wrap around
    end_index = drops[0] + 1
    lon = lon[:end_index]

然后您可以使用end_index切片其他匹配数组(例如纬度)。

请注意,这不会对[0..360]以外的值进行任何修复 - 您必须单独执行此操作,具体取决于您希望如何处理它们。

更新新要求:

assert len(lon) > 0
above_first = (lon >= lon[0]).astype(int)
diffs = np.diff(above_first)
overlap_indices = np.flatnonzero(diffs > 0)
if len(overlap_indices) > 0:
    end_index = overlap_indices[0] + 1
    lon = lon[:end_index]

即使重叠多次缠绕也会有效。

答案 2 :(得分:1)

基于@Ixop的想法:

L = lon[0:np.argmax(np.diff(lon)<0)+1]

您可以在一行中写相同的内容:

for col in ['col1','col2']:
    to_update = df1[col] != df2[col]
    df1.loc[to_update,col] = df2.loc[to_update,col]
    df1.loc[to_update,'lastUpdateDate'] = todays_date

答案 3 :(得分:1)

新解决方案:从lon2结尾开始,与lon2的第一个元素进行比较

lon2 = np.array([50,110,200,340,1,10,25,80,90,130])
#lon2 = lon

ix = np.argmax(lon2[::-1] < lon2[0])
L2 = lon2[0:-ix]

给出

with lon2 =  [ 50 110 200 340   1  10  25]

with lon =  [  9.78300000e-01   2.01276000e+01   4.03784000e+01   6.00987000e+01
   8.03748000e+01   1.00999900e+02   1.20456700e+02   1.40354300e+02
   1.60234200e+02   1.80345300e+02   2.00887400e+02   2.20234600e+02
   2.40555400e+02   2.60567600e+02   2.80434500e+02   3.00445400e+02
   3.20565400e+02   3.40643200e+02   3.60334300e+02   1.24000000e-02]

答案 4 :(得分:0)

怎么样

tmp = lon - lon[0]
tmp[tmp<0] += 360
sliced = lon[:np.where(np.diff(tmp) < 0)[0][0]+1]