如何以最有效的方式获得连续计数?

时间:2016-03-09 06:37:17

标签: python pandas feature-extraction data-science

我是Python数据科学的初学者。我正在处理点击流数据并尝试计算给定会话中项目的连续点击次数。我在'Block'栏中得到了累计金额。之后我在Block上聚合以获得每个块的计数。最后我想通过Session和Item进行分组并聚合块计数,因为可能存在这样的情况(此处为Sid = 6),其中项目首先连续m次,并且在其他项目之后连续出现,连续n次。所以连续计数应该是'm + n'。

这是数据集 -

    Sid                    Tstamp     Itemid
0     1  2014-04-07T10:51:09.277Z  214536502
1     1  2014-04-07T10:54:09.868Z  214536500
2     1  2014-04-07T10:54:46.998Z  214536506
3     1  2014-04-07T10:57:00.306Z  214577561
4     2  2014-04-07T13:56:37.614Z  214662742
5     2  2014-04-07T13:57:19.373Z  214662742
6     2  2014-04-07T13:58:37.446Z  214825110
7     2  2014-04-07T13:59:50.710Z  214757390
8     2  2014-04-07T14:00:38.247Z  214757407
9     2  2014-04-07T14:02:36.889Z  214551617
10    3  2014-04-02T13:17:46.940Z  214716935
11    3  2014-04-02T13:26:02.515Z  214774687
12    3  2014-04-02T13:30:12.318Z  214832672
13    4  2014-04-07T12:09:10.948Z  214836765
14    4  2014-04-07T12:26:25.416Z  214706482
15    6  2014-04-03T10:44:35.672Z  214821275
16    6  2014-04-03T10:45:01.674Z  214821275
17    6  2014-04-03T10:45:29.873Z  214821371
18    6  2014-04-03T10:46:12.162Z  214821371
19    6  2014-04-03T10:46:57.355Z  214821371
20    6  2014-04-03T10:53:22.572Z  214717089
21    6  2014-04-03T10:53:49.875Z  214563337
22    6  2014-04-03T10:55:19.267Z  214706462
23    6  2014-04-03T10:55:47.327Z  214821371
24    6  2014-04-03T10:56:30.520Z  214821371
25    6  2014-04-03T10:57:19.331Z  214821371
26    6  2014-04-03T10:57:39.433Z  214819762

这是我的代码 -

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

       xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">

       <modelVersion>4.0.0</modelVersion>

       <groupId>com.example.employees</groupId>

       <artifactId>employees-app</artifactId>

       <packaging>war</packaging>

       <version>1.0-SNAPSHOT</version>

       <name>employees-app Maven Webapp</name>

       <url>http://maven.apache.org</url>

       <properties>

              <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>

              <tomcat.version>8.0.32</tomcat.version>

              <spring.version>4.2.5.RELEASE</spring.version>

       </properties>

       <dependencies>



              <dependency>

                     <groupId>org.springframework</groupId>

                     <artifactId>spring-web</artifactId>

                     <version>${spring.version}</version>

              </dependency>



              <dependency>

                     <groupId>org.springframework</groupId>

                     <artifactId>spring-webmvc</artifactId>

                     <version>${spring.version}</version>

              </dependency>



              <dependency>

                     <groupId>commons-fileupload</groupId>

                     <artifactId>commons-fileupload</artifactId>

                     <version>1.3.1</version>

              </dependency>



              <dependency>

                     <groupId>org.apache.tomcat.embed</groupId>

                     <artifactId>tomcat-embed-core</artifactId>

                     <version>${tomcat.version}</version>

              </dependency>

              <dependency>

                     <groupId>org.apache.tomcat.embed</groupId>

                     <artifactId>tomcat-embed-logging-juli</artifactId>

                     <version>${tomcat.version}</version>

              </dependency>

              <dependency>

                     <groupId>org.apache.tomcat.embed</groupId>

                     <artifactId>tomcat-embed-jasper</artifactId>

                     <version>${tomcat.version}</version>

              </dependency>

              <dependency>

                     <groupId>org.apache.tomcat</groupId>

                     <artifactId>tomcat-jasper</artifactId>

                     <version>${tomcat.version}</version>

              </dependency>

              <dependency>

                     <groupId>org.apache.tomcat</groupId>

                     <artifactId>tomcat-jasper-el</artifactId>

                     <version>${tomcat.version}</version>

              </dependency>

              <dependency>

                     <groupId>org.apache.tomcat</groupId>

                     <artifactId>tomcat-jsp-api</artifactId>

                     <version>${tomcat.version}</version>

              </dependency>

              <dependency>

                     <groupId>jstl</groupId>

                     <artifactId>jstl</artifactId>

                     <version>1.2</version>

              </dependency>

              <dependency>

                     <groupId>javax.servlet</groupId>

                     <artifactId>javax.servlet-api</artifactId>

                     <version>3.0.1</version>

              </dependency>

              <dependency>

                     <groupId>org.eclipse.jdt.core.compiler</groupId>

                     <artifactId>ecj</artifactId>

                     <version>4.5.1</version>

              </dependency>



       </dependencies>

       <build>

              <finalName>employees-app</finalName>

              <resources>

                     <resource>

                           <directory>src/main/webapp</directory>

                           <targetPath>META-INF/resources</targetPath>

                     </resource>

              </resources>

              <plugins>

                     <plugin>

                           <groupId>org.apache.maven.plugins</groupId>

                           <artifactId>maven-compiler-plugin</artifactId>

                           <version>3.5.1</version>

                           <inherited>true</inherited>

                           <configuration>

                                  <source>1.8</source>

                                  <target>1.8</target>

                           </configuration>



                     </plugin>

                     <plugin>

                           <groupId>org.apache.maven.plugins</groupId>

                           <artifactId>maven-assembly-plugin</artifactId>

                           <configuration>

                                  <descriptorRefs>

                                         <descriptorRef>jar-with-dependencies</descriptorRef>

                                  </descriptorRefs>

                                  <finalName>employees-app-${project.version}</finalName>

                                  <archive>

                                         <manifest>

                                                <mainClass>com.example.employees.Main</mainClass>

                                         </manifest>

                                  </archive>

                           </configuration>

                           <executions>

                                  <execution>

                                         <phase>package</phase>

                                         <goals>

                                                <goal>single</goal>

                                         </goals>

                                  </execution>

                           </executions>

                     </plugin>

              </plugins>

       </build>

</project>

2 个答案:

答案 0 :(得分:1)

赢得了这项工作吗?

<table ng-table="vm.tableParams" class="table" show-filter="true">

  <colgroup>
    <col width="60%" />
    <col width="20%" />
    <col width="20%" />
  </colgroup>
  <tr class="ng-table-group" ng-repeat-start="group in $groups">
    <td colspan="2">
      <a href="" ng-click="group.$hideRows = !group.$hideRows">
        <span class="glyphicon" ng-class="{ 'glyphicon-chevron-right': group.$hideRows, 'glyphicon-chevron-down': !group.$hideRows }"></span>
        <strong>{{ group.value }}</strong>
      </a>
    </td>
  </tr>

 <tr ng-hide="group.$hideRows" ng-repeat="user in group.data" ng-repeat-end>
    <td title="'Name'" filter="{ Name: 'text'}" sortable="'Name'">
      {{user.name}}
    </td>
    <td title="'Age'" filter="{ Value: 'number'}" sortable="'Value'">
      {{user.age}}
    </td>
  </tr>
</table>

答案 1 :(得分:1)

IIUC你可以:

k['Block'] =( k['Itemid'] != k['Itemid'].shift(1) ).astype(int).cumsum()
#print k
z=k.groupby(['Sid','Itemid', 'Block']).size().groupby(level=[0,1]).sum().reset_index(name='sum_counts') 
print z
    Sid     Itemid  sum_counts
0     1  214536500           1
1     1  214536502           1
2     1  214536506           1
3     1  214577561           1
4     2  214551617           1
5     2  214662742           2
6     2  214757390           1
7     2  214757407           1
8     2  214825110           1
9     3  214716935           1
10    3  214774687           1
11    3  214832672           1
12    4  214706482           1
13    4  214836765           1
14    6  214701242           1
15    6  214826623           1
16    7  214826715           1
17    7  214826835           1
18    8  214838855           2
19    9  214576500           3
20   11  214563337           1
21   11  214706462           1
22   11  214717089           1
23   11  214819762           1
24   11  214821275           2
25   11  214821371           6