Question

我有一列有数千行。我想选择最重要的一个。假设我要选择所有代表样本90％的行。我该怎么办？

我有一个包含2列的数据框，其中一列用于product_id，一列显示是否已购买（值是0还是1）

product_id    purchased
   a             1
   b             0
   c             0
   d             1
   a             1
   .             .
   .             .

使用df ['product_id']。value_counts（），我可以按出现次数对我所有的产品ID进行排名。假设现在我想获得在以后的分析中应该考虑的product_id数量，该数量将代表总发生次数的90％。有办法吗？

Answer 1

如果希望所有Project build error: Non-resolvable parent POM for com.tutorials:HelloWorld:0.0.1-SNAPSHOT: Failure to transfer org.springframework.boot:spring-boot-starter-parent:pom: 2.2.2.RELEASE from https://repo.maven.apache.org/maven2 was cached in the local repository, resolution will not be reattempted until the update interval of central has elapsed or updates are forced. Original error: Could not transfer artifact org.springframework.boot:spring-boot-starter-parent:pom:2.2.2.RELEASE from/to central (https:// repo.maven.apache.org/maven2): proxy.example.com and 'parent.relativePath' points at wrong local POM的计数都在product_id以下，则使用：

0.9

或者如果要使所有行按计数排序并获得其中的s = df['product_id'].value_counts(normalize=True).cumsum() df1 = df[df['product_id'].isin(s.index[s < 0.9])]：

90%

计算在熊猫列中出现的百分比

1 个答案: