示例脚本示例限制?

时间:2020-04-08 15:30:41

标签: python pandas

我正在研究一个脚本,该脚本从excel文件中的每个类别中抽取一个样本。根据长度的不同,可以采用不同的百分比,但是我想知道是否有一种方法可以将每个样本限制为5个项目,即使1%可以带回2个项目。任何帮助,将不胜感激。

for Guesses in range(9):
        print('Take a guess.')

        Guess = int(input())

        if Guess < 0:
            print('Please enter a positive number')
        elif Guess > 100:
            print('The number is only between 0 and 100')
        elif Guess < Number:
            print('Higher...')
        elif Guess > Number:
            print('Lower...')           
        else:
            print('Spot on!')
            break # Guess was correct

1 个答案:

答案 0 :(得分:1)

您可以使用x.size * 0.01来检查可以获取多少个值,并使用sample(n=5)而不是sample(frac=0.01)

.apply(lambda x: x.sample(n=5) if x.size*0.01 < 5 else x.sample(frac=0.01))

import pandas as pd
import random

random.seed(1) #  to generate always the same random data

data = {'Category': [random.choice([1,2,2,2,3]) for x in range(1000)]} # columns
df = pd.DataFrame(data)
print(df)

# --- before ---
df1 = df.groupby('Category').apply(lambda x: x.sample(frac=0.01))
print('--- before ---')
print(df1['Category'].value_counts())

# --- after ---
df2 = df.groupby('Category').apply(lambda x: x.sample(n=5) if x.size*.01 < 5 else x.sample(frac=0.01))
print('--- after ---')
print(df2['Category'].value_counts())

结果

--- before ---
2    6
3    2
1    2
Name: Category, dtype: int64

--- after ---
2    6
3    5
1    5
Name: Category, dtype: int64 

编辑:更具可读性

def myfunction(x):
    if x.size*0.01 < 5:
         return x.sample(n=5)
    else:
         return x.sample(frac=0.01)

df1 = df.groupby('Category').apply(myfunction)