Question

我有一个像这样的pandas专栏：

我想搜索当前行值并查找之前关闭的行的匹配项。例如，index4（10.7）将返回1的匹配，因为它接近index2（10.8）。类似地，index8（10.6）将返回2的匹配，因为它接近index2和4。

对于此示例，使用+/- 5％的阈值将输出以下内容：

index colA  matches
1     10.2    0
2     10.8    0
3     11.6    0
4     10.7    2
5     9.5     0
6     6.2     0
7     12.9    0
8     10.6    3
9     6.4     1
10    20.5    0

对于大型数据帧，我想将其限制为先前要搜索的X（300？）行数，而不是整个数据帧。

Answer 1

这是一个利用广播比较的numpy解决方案：

df

   index  colA  matches
0      1  10.2        0
1      2  10.8        0
2      3  11.6        0
3      4  10.7        2
4      5   9.5        0
5      6   6.2        0
6      7  12.9        0
7      8  10.6        3
8      9   6.4        1
9     10  20.5        0

 <dependencies>
    <dependency>
        <groupId>org.springframework</groupId>
        <artifactId>spring-core</artifactId>
        <version>${org.springframework.version}</version>
    </dependency>


<dependency>
    <groupId>org.springframework</groupId>
    <artifactId>spring-web</artifactId>
    <version>${org.springframework.version}</version>
</dependency>

<dependency>
    <groupId>org.springframework</groupId>
    <artifactId>spring-webmvc</artifactId>
    <version>${org.springframework.version}</version>
</dependency>
<dependency>
        <groupId>org.springframework.data</groupId>
        <artifactId>spring-data-jpa</artifactId>
        <version>1.6.0.RELEASE</version>
        <exclusions>
            <exclusion>
                <groupId>org.springframework</groupId>
                <artifactId>spring-core</artifactId>
            </exclusion>
            <exclusion>
                <artifactId>spring-context</artifactId>
                <groupId>org.springframework</groupId>
            </exclusion>
        </exclusions>
    </dependency>
    <dependency>
        <groupId>org.codehaus.jackson</groupId>
        <artifactId>jackson-mapper-asl</artifactId>
        <version>${jackson.version}</version>
    </dependency>
    <!-- Hibernate -->
    <dependency>
        <groupId>org.hibernate</groupId>
        <artifactId>hibernate-validator</artifactId>
        <version>4.2.0.Final</version>
    </dependency>

    <dependency>
        <groupId>org.hibernate</groupId>
        <artifactId>hibernate-entitymanager</artifactId>
        <version>4.2.6.Final</version>
    </dependency>

    <!-- hsql -->
    <dependency>
        <groupId>org.hsqldb</groupId>
        <artifactId>hsqldb</artifactId>
        <version>2.3.0</version>
    </dependency>

    <!-- apache -->
    <dependency>
        <groupId>net.sf.dozer</groupId>
        <artifactId>dozer</artifactId>
        <version>5.4.0</version>
    </dependency>

    <!-- Servlet -->
    <dependency>
        <groupId>javax.servlet</groupId>
        <artifactId>javax.servlet-api</artifactId>
        <version>3.0.1</version>
        <scope>provided</scope>
    </dependency>


    <dependency>
        <groupId>javax.servlet.jsp</groupId>
        <artifactId>jsp-api</artifactId>
        <version>2.2</version>
        <scope>provided</scope>
    </dependency>

    <dependency>
        <groupId>javax.servlet</groupId>
        <artifactId>jstl</artifactId>
        <version>1.2</version>
    </dependency>

    <dependency>
        <groupId>org.slf4j</groupId>
        <artifactId>slf4j-api</artifactId>
        <version>${org.slf4j-version}</version>
    </dependency>

    <dependency>
        <groupId>org.apache.commons</groupId>
        <artifactId>commons-lang3</artifactId>
        <version>3.0</version>
    </dependency>

    <dependency>
        <groupId>org.slf4j</groupId>
        <artifactId>jcl-over-slf4j</artifactId>
        <version>${org.slf4j-version}</version>
        <scope>runtime</scope>
    </dependency>

    <dependency>
        <groupId>org.slf4j</groupId>
        <artifactId>slf4j-log4j12</artifactId>
        <version>${org.slf4j-version}</version>
        <scope>runtime</scope>
    </dependency>

    <dependency>
        <groupId>log4j</groupId>
        <artifactId>log4j</artifactId>
        <version>1.2.15</version>
        <exclusions>
            <exclusion>
                <groupId>javax.mail</groupId>
                <artifactId>mail</artifactId>
            </exclusion>
            <exclusion>
                <groupId>javax.jms</groupId>
                <artifactId>jms</artifactId>
            </exclusion>
            <exclusion>
                <groupId>com.sun.jdmk</groupId>
                <artifactId>jmxtools</artifactId>
            </exclusion>
            <exclusion>
                <groupId>com.sun.jmx</groupId>
                <artifactId>jmxri</artifactId>
            </exclusion>
        </exclusions>
        <scope>runtime</scope>
    </dependency>

    <!-- web jars -->
    <dependency>
        <groupId>org.webjars</groupId>
        <artifactId>bootstrap</artifactId>
        <version>3.2.0</version>
        <exclusions>
            <exclusion>
                <groupId>org.webjars</groupId>
                <artifactId>jquery</artifactId>
            </exclusion>
        </exclusions>
    </dependency>

    <dependency>
        <groupId>org.webjars</groupId>
        <artifactId>bootstrap-material-design</artifactId>
        <version>0.2.1</version>
    </dependency>

    <dependency>
        <groupId>org.webjars</groupId>
        <artifactId>jquery</artifactId>
        <version>2.1.1</version>
    </dependency>

    <dependency>
        <groupId>org.webjars</groupId>
        <artifactId>angularjs</artifactId>
        <version>1.3.8</version>
    </dependency>

    <!-- test -->
    <dependency>
        <groupId>junit</groupId>
        <artifactId>junit</artifactId>
        <version>4.11</version>
        <scope>test</scope>
    </dependency>

请注意;这非常快，但不处理大型数据帧的300行限制。

Answer 2

使用三角形索引确保我们只向后看。然后使用a = df.colA.values i, j = np.tril_indices(len(a), -1) mask = np.abs(a[i] - a[j]) / a[i] <= .05 df.assign(matches=np.bincount(i[mask], minlength=len(a))) colA matches index 1 10.2 0 2 10.8 0 3 11.6 0 4 10.7 2 5 9.5 0 6 6.2 0 7 12.9 0 8 10.6 3 9 6.4 1 10 20.5 0累积匹配项。

numba

如果您遇到资源问题，请考虑使用优质的'ol fashion loop。但是，如果您可以访问from numba import njit @njit def counter(a): c = np.arange(len(a)) * 0 for i, x in enumerate(a): for j, y in enumerate(a): if j < i: if abs(x - y) / x <= .05: c[i] += 1 return c df.assign(matches=counter(a)) colA matches index 1 10.2 0 2 10.8 0 3 11.6 0 4 10.7 2 5 9.5 0 6 6.2 0 7 12.9 0 8 10.6 3 9 6.4 1 10 20.5 0，则可以大大加快这一速度。

return new ExpansionTile(..);

Answer 3

rolling apply

df.colA.rolling(window=len(df),min_periods=1).apply(lambda x : sum(abs((x-x[-1])/x[-1])<0.05)-1)
Out[113]: 
index
1     0.0
2     0.0
3     0.0
4     2.0
5     0.0
6     0.0
7     0.0
8     3.0
9     1.0
10    0.0
Name: colA, dtype: float64

，如果速度很重要，请查看冷answer

//dont add inine function 

//add a common class
    <input   class="getValue" value="4219" type="button">
    <input  class="getValue" value="5419" type="button">


//add event handler

for (var i = 0; i < getValue.length; i++) {
    getValue[i].addEventListener('click', function(){
alert(this.vaue);
});
}

Pandas - 获取当前行，将值与X前一行进行比较并返回多少匹配（在x％范围内）

3 个答案: