Pandas - 获取当前行,将值与X前一行进行比较并返回多少匹配(在x%范围内)

时间:2018-03-11 06:22:55

标签: python pandas numpy dataframe

我有一个像这样的pandas专栏:

index colA
1     10.2
2     10.8
3     11.6
4     10.7
5     9.5
6     6.2
7     12.9
8     10.6
9     6.4
10    20.5

我想搜索当前行值并查找之前关闭的行的匹配项。例如,index4(10.7)将返回1的匹配,因为它接近index2(10.8)。类似地,index8(10.6)将返回2的匹配,因为它接近index2和4。

对于此示例,使用+/- 5%的阈值将输出以下内容:

index colA  matches
1     10.2    0
2     10.8    0
3     11.6    0
4     10.7    2
5     9.5     0
6     6.2     0
7     12.9    0
8     10.6    3
9     6.4     1
10    20.5    0

对于大型数据帧,我想将其限制为先前要搜索的X(300?)行数,而不是整个数据帧。

3 个答案:

答案 0 :(得分:3)

这是一个利用广播比较的numpy解决方案:

df

   index  colA  matches
0      1  10.2        0
1      2  10.8        0
2      3  11.6        0
3      4  10.7        2
4      5   9.5        0
5      6   6.2        0
6      7  12.9        0
7      8  10.6        3
8      9   6.4        1
9     10  20.5        0

 <dependencies>
    <dependency>
        <groupId>org.springframework</groupId>
        <artifactId>spring-core</artifactId>
        <version>${org.springframework.version}</version>
    </dependency>


<dependency>
    <groupId>org.springframework</groupId>
    <artifactId>spring-web</artifactId>
    <version>${org.springframework.version}</version>
</dependency>

<dependency>
    <groupId>org.springframework</groupId>
    <artifactId>spring-webmvc</artifactId>
    <version>${org.springframework.version}</version>
</dependency>
<dependency>
        <groupId>org.springframework.data</groupId>
        <artifactId>spring-data-jpa</artifactId>
        <version>1.6.0.RELEASE</version>
        <exclusions>
            <exclusion>
                <groupId>org.springframework</groupId>
                <artifactId>spring-core</artifactId>
            </exclusion>
            <exclusion>
                <artifactId>spring-context</artifactId>
                <groupId>org.springframework</groupId>
            </exclusion>
        </exclusions>
    </dependency>
    <dependency>
        <groupId>org.codehaus.jackson</groupId>
        <artifactId>jackson-mapper-asl</artifactId>
        <version>${jackson.version}</version>
    </dependency>
    <!-- Hibernate -->
    <dependency>
        <groupId>org.hibernate</groupId>
        <artifactId>hibernate-validator</artifactId>
        <version>4.2.0.Final</version>
    </dependency>

    <dependency>
        <groupId>org.hibernate</groupId>
        <artifactId>hibernate-entitymanager</artifactId>
        <version>4.2.6.Final</version>
    </dependency>

    <!-- hsql -->
    <dependency>
        <groupId>org.hsqldb</groupId>
        <artifactId>hsqldb</artifactId>
        <version>2.3.0</version>
    </dependency>

    <!-- apache -->
    <dependency>
        <groupId>net.sf.dozer</groupId>
        <artifactId>dozer</artifactId>
        <version>5.4.0</version>
    </dependency>

    <!-- Servlet -->
    <dependency>
        <groupId>javax.servlet</groupId>
        <artifactId>javax.servlet-api</artifactId>
        <version>3.0.1</version>
        <scope>provided</scope>
    </dependency>


    <dependency>
        <groupId>javax.servlet.jsp</groupId>
        <artifactId>jsp-api</artifactId>
        <version>2.2</version>
        <scope>provided</scope>
    </dependency>

    <dependency>
        <groupId>javax.servlet</groupId>
        <artifactId>jstl</artifactId>
        <version>1.2</version>
    </dependency>

    <dependency>
        <groupId>org.slf4j</groupId>
        <artifactId>slf4j-api</artifactId>
        <version>${org.slf4j-version}</version>
    </dependency>

    <dependency>
        <groupId>org.apache.commons</groupId>
        <artifactId>commons-lang3</artifactId>
        <version>3.0</version>
    </dependency>

    <dependency>
        <groupId>org.slf4j</groupId>
        <artifactId>jcl-over-slf4j</artifactId>
        <version>${org.slf4j-version}</version>
        <scope>runtime</scope>
    </dependency>

    <dependency>
        <groupId>org.slf4j</groupId>
        <artifactId>slf4j-log4j12</artifactId>
        <version>${org.slf4j-version}</version>
        <scope>runtime</scope>
    </dependency>

    <dependency>
        <groupId>log4j</groupId>
        <artifactId>log4j</artifactId>
        <version>1.2.15</version>
        <exclusions>
            <exclusion>
                <groupId>javax.mail</groupId>
                <artifactId>mail</artifactId>
            </exclusion>
            <exclusion>
                <groupId>javax.jms</groupId>
                <artifactId>jms</artifactId>
            </exclusion>
            <exclusion>
                <groupId>com.sun.jdmk</groupId>
                <artifactId>jmxtools</artifactId>
            </exclusion>
            <exclusion>
                <groupId>com.sun.jmx</groupId>
                <artifactId>jmxri</artifactId>
            </exclusion>
        </exclusions>
        <scope>runtime</scope>
    </dependency>

    <!-- web jars -->
    <dependency>
        <groupId>org.webjars</groupId>
        <artifactId>bootstrap</artifactId>
        <version>3.2.0</version>
        <exclusions>
            <exclusion>
                <groupId>org.webjars</groupId>
                <artifactId>jquery</artifactId>
            </exclusion>
        </exclusions>
    </dependency>

    <dependency>
        <groupId>org.webjars</groupId>
        <artifactId>bootstrap-material-design</artifactId>
        <version>0.2.1</version>
    </dependency>

    <dependency>
        <groupId>org.webjars</groupId>
        <artifactId>jquery</artifactId>
        <version>2.1.1</version>
    </dependency>

    <dependency>
        <groupId>org.webjars</groupId>
        <artifactId>angularjs</artifactId>
        <version>1.3.8</version>
    </dependency>

    <!-- test -->
    <dependency>
        <groupId>junit</groupId>
        <artifactId>junit</artifactId>
        <version>4.11</version>
        <scope>test</scope>
    </dependency>

请注意;这非常快,但不处理大型数据帧的300行限制。

答案 1 :(得分:3)

使用三角形索引确保我们只向后看。然后使用a = df.colA.values i, j = np.tril_indices(len(a), -1) mask = np.abs(a[i] - a[j]) / a[i] <= .05 df.assign(matches=np.bincount(i[mask], minlength=len(a))) colA matches index 1 10.2 0 2 10.8 0 3 11.6 0 4 10.7 2 5 9.5 0 6 6.2 0 7 12.9 0 8 10.6 3 9 6.4 1 10 20.5 0 累积匹配项。

numba

如果您遇到资源问题,请考虑使用优质的'ol fashion loop。但是,如果您可以访问from numba import njit @njit def counter(a): c = np.arange(len(a)) * 0 for i, x in enumerate(a): for j, y in enumerate(a): if j < i: if abs(x - y) / x <= .05: c[i] += 1 return c df.assign(matches=counter(a)) colA matches index 1 10.2 0 2 10.8 0 3 11.6 0 4 10.7 2 5 9.5 0 6 6.2 0 7 12.9 0 8 10.6 3 9 6.4 1 10 20.5 0 ,则可以大大加快这一速度。

return new ExpansionTile(..);

答案 2 :(得分:2)

rolling apply df.colA.rolling(window=len(df),min_periods=1).apply(lambda x : sum(abs((x-x[-1])/x[-1])<0.05)-1) Out[113]: index 1 0.0 2 0.0 3 0.0 4 2.0 5 0.0 6 0.0 7 0.0 8 3.0 9 1.0 10 0.0 Name: colA, dtype: float64 ,如果速度很重要,请查看冷answer

//dont add inine function 

//add a common class
    <input   class="getValue" value="4219" type="button">
    <input  class="getValue" value="5419" type="button">


//add event handler

for (var i = 0; i < getValue.length; i++) {
    getValue[i].addEventListener('click', function(){
alert(this.vaue);
});
}