Question

我是一名研究人员，使用Python从事气候模型输出，以发现某些类型的风暴。我有8个大型numpy阵列（尺寸为109574 x 52 x 57）。这些数组用1填充，表示当天有风暴（第一个维度是时间），0表示没有暴风雨。另外两个维度是纬度和经度。

我必须从这些阵列中消除背靠背的日子。例如，如果在第1天和第2天发生风暴，我想只计算1次风暴。如果第1天，第2天和第3天有暴风雨，我想只计算1和3总共两场风暴，第1-4天将有2场风暴，依此类推。我在最后使用np.sum找到了风暴的数量，以便沿着时间轴计算阵列中的1＃。

我正在运行以下代码来实现这一目标，但我遇到的问题是它非常慢。因为我将不得不为其他数据集重复这个过程，我想知道是否有办法加快这个过程的效率。我的代码如下，我非常乐意澄清任何内容。

# If there is a storm that overlaps two two-day periods, only count it once
print("Eliminating doubles...")
for i in range(52):
    for j in range(57):
        print(i,j)
        for k in range(109573):
            if((storms1[k,i,j]) == 1 and (storms1[k+1,i,j] == 1)):
                storms1[k,i,j] = 0
            if((storms2[k,i,j]) == 1 and (storms2[k+1,i,j] == 1)):
                storms2[k,i,j] = 0
            if((storms3[k,i,j]) == 1 and (storms3[k+1,i,j] == 1)):
                storms3[k,i,j] = 0
            if((storms4[k,i,j]) == 1 and (storms4[k+1,i,j] == 1)):
                storms4[k,i,j] = 0
            if((storms5[k,i,j]) == 1 and (storms5[k+1,i,j] == 1)):
                storms5[k,i,j] = 0
            if((storms6[k,i,j]) == 1 and (storms6[k+1,i,j] == 1)):
                storms6[k,i,j] = 0
            if((storms7[k,i,j]) == 1 and (storms7[k+1,i,j] == 1)):
                storms7[k,i,j] = 0
            if((storms8[k,i,j]) == 1 and (storms8[k+1,i,j] == 1)):
                storms8[k,i,j] = 0

在有人建议使用循环迭代数组之前，我更改了变量名称以简化它们以便提出这个问题。

感谢您的帮助。

Answer 1

这是一个矢量化函数，可以替换你最内层的循环：

def do(KK):
    # find stretches of ones
    switch_points = np.where(np.diff(np.r_[0, KK, 0]))[0]
    switch_points.shape = -1, 2
    # isolate stretches starting on odd days and create mask
    odd_starters = switch_points[switch_points[:, 0] % 2 == 1, :]
    odd_mask = np.zeros((KK.shape[0] + 1,), dtype=KK.dtype)
    odd_mask[odd_starters] = 1, -1
    odd_mask = np.add.accumulate(odd_mask[:-1])
    # apply global 1,0,1,0,1,0,... mask
    KK[1::2] = 0
    # invert stretches starting on odd days
    KK ^= odd_mask

从外部循环对（i和j）中调用它：

do(storms1[:, i, j])
do(storms2[:, i, j])
etc.

它将就地更改数组。

这应该比循环快得多（两个外部循环没有区别）。

工作原理：

它找到块的起点和终点。我们知道在每个这样的块中，每隔一个块必须是零。使用全局1,0,1,0,1,0，...掩码，算法每隔一天就会清零。

产生

偶数天开始的块中的正确结果
没有改变外块
以及在奇数天开始的块中正确模式的补充

该算法的最后一步是反转这些奇数起始块。

Answer 2

使用模拟第一轴的一维数组的示例。首先，找到1组的起始位置。接下来，找到每个组的长度。最后，根据您的逻辑计算事件数量：

import numpy

a = numpy.random.randint(0,2,20)

# Add an initial 0
a1 = numpy.r_[0, a]

# Mark the start of each group of 1's
d1 = numpy.diff(a1) > 0

# Indices of the start of groups of 1's
w1 = numpy.arange(len(d1))[d1]

# Length of each group
cs = numpy.cumsum(a)
c = numpy.diff(numpy.r_[cs[w1], cs[-1]+1])

# Apply the counting logic
storms = c - c//2

print(a)
>>> array([0, 1, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 0, 1, 1, 1])
print(c)
>>> array([1, 2, 4, 1, 3])
print(storms)
>>> array([1, 1, 2, 1, 2])

通过在不再需要后重用变量名等，可以节省比我在此处显示的内存更多的内存。

Answer 3

所以我想你想要：

 apply plugin: 'com.android.application'

android {
compileSdkVersion 25
buildToolsVersion "25.0.0"
defaultConfig {
    minSdkVersion 15
    targetSdkVersion 25
    versionCode 1
    versionName "1.0"
    multiDexEnabled true
    testInstrumentationRunner "android.support.test.runner.AndroidJUnitRunner"
}
buildTypes {
    release {
        minifyEnabled false
        proguardFiles getDefaultProguardFile('proguard-android.txt'), 'proguard-rules.pro'
    }
}
}
repositories {
mavenCentral()
}

dependencies {
compile fileTree(dir: 'libs', include: ['*.jar'])
androidTestCompile('com.android.support.test.espresso:espresso-core:2.2.2', {
    exclude group: 'com.android.support', module: 'support-annotations'
})
compile 'com.android.support:appcompat-v7:25.1.0'
compile 'com.android.support:support-v4:25.1.0'
compile 'com.android.volley:volley:1.0.0'
compile 'com.mcxiaoke.volley:library:1.0.19'
compile 'dev.dworks.libs:volleyplus:0.1.4'
compile 'com.squareup.okhttp3:okhttp:3.5.0'
compile 'com.google.android.gms:play-services-gcm:10.0.1'
compile 'com.google.firebase:firebase-appindexing:10.0.1'
compile 'com.android.support:design:25.1.0'
compile 'com.roughike:bottom-bar:1.2.1'
compile 'com.ncapdevi:frag-nav:1.2.2'
compile 'me.dm7.barcodescanner:zxing:1.8.4'
compile 'com.android.support:cardview-v7:25.1.0'
compile 'com.android.support:recyclerview-v7:25.1.0'
compile 'com.google.firebase:firebase-core:10.0.1'
compile 'com.google.firebase:firebase-messaging:10.0.1'
compile 'de.hdodenhof:circleimageview:2.1.0'
compile 'com.loopj.android:android-async-http:1.4.9'
compile 'net.gotev:uploadservice:3.0.3'
compile 'com.facebook.android:facebook-android-sdk:[4,5)'
compile 'com.google.android.gms:play-services-auth:10.0.1'
compile 'com.journeyapps:zxing-android-embedded:3.4.0'
compile 'com.ogaclejapan.smarttablayout:library:1.6.1@aar'
compile 'com.ogaclejapan.smarttablayout:utils-v4:1.6.1@aar'
compile 'com.hedgehog.ratingbar:app:1.1.2'
testCompile 'junit:junit:4.12'
compile 'com.squareup.picasso:picasso:2.5.2'
compile 'com.jakewharton.picasso:picasso2-okhttp3-downloader:1.1.0'

}
apply plugin: 'com.google.gms.google-services'

这是不您的代码示例正在做什么，但是您在第二段中想要做的是。

要做到这一点，您需要两个步骤

storms_in[:,i,j] = [0,0,1,1,0,1,1,1,0,1,0,1,1,1,1,0]
storms_out[:,i,j]= [0,0,1,0,0,1,0,1,0,1,0,1,0,0,1,0]

这会对整个过程进行矢量化，您只需要调用它8次。 def storms_disc(storms): # put the whole array here, boolean-safe z = np.zeros((1,) + storms.shape[1:]) # zero-pads for the ends changes = np.r_[storms.astype('int8') ,z] - np.r_[z, storms.astype('int8')] #find where the weather changes changes=((changes[:-1] == 1) | (changes[1:] == -1)).astype('int8') # reduce dimension return ((np.r_[changes, z] - np.r_[z, changes])[:-1] == 1).astype(storms.dtype) #find the first of successive changes调用是因为减去布尔值会导致错误，即使它们的值为1和0

测试：

astype

提高检查重复项的效率 - Python

3 个答案: