我有数据:
Village Workers Level
Aagar 10 Small
Dhagewadi 32 Small
Sherewadi 34 Small
Shindwad 42 Small
Dhokari 84 Medium
Khanapur 65 Medium
Ambikanagar 45 Medium
Takali 127 Large
Gardhani 122 Large
Pi.Khand 120 Large
Pangri 105 Large
代码:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df=pd.read_csv("/home/desktop/Desktop/t.csv")
df = df.sort('Workers', ascending=False)
df['Level'] = pd.qcut(df['Workers'], 3, ['Small','Medium','Large'])
df['Sum_Level_wise'] = df.groupby('Level')['Workers'].transform('sum')
df['Probability'] = df['Sum_Level_wise'].div(df['Workers'].sum()).round(2)
df['Sample'] = df['Probability'] * df.groupby('Level')['Workers'].transform('size')
df['Selected villages'] = df['Sample'].apply(np.ceil).astype(int)
def f(x):
a = x['Village'].head(x['Selected villages'].iat[0])
print (x['Village'])
print (a)
if (len(x) < len(a)):
print ('original village cannot be filled to Selected village, because length is higher')
return a
df['Selected village'] = df.groupby('Level').apply(f).reset_index(level=0)['Village']
df['Selected village'] = df['Selected village'].fillna('')
print (df)
接下来,我已经获得了在抽样中选择的村庄
所以,我只想选择相应的工人详细信息和Level列的村名。
像这样:( Excel照片)所以,我只想要那个村名,因为我不想展示每个步骤。
只是使用5个村庄的抽样,这些数据会显示,有什么帮助吗?
答案 0 :(得分:0)
您似乎需要head:
<?xml version="1.0" encoding="utf-8"?>
<svg xmlns="http://www.w3.org/2000/svg" xmlns:amcharts="http://amcharts.com/ammap" xmlns:xlink="http://www.w3.org/1999/xlink" version="1.1">
<defs>
<style type="text/css">
.land
{
fill: #CCCCCC;
fill-opacity: 1;
stroke:white;
stroke-opacity: 1;
stroke-width:1;
display:block;
}
.land:hover
{
fill: blue;
cursor: pointer;
}
</style>
<dr_map projection="mercator" leftLongitude="-72.004173" topLatitude="19.932499" rightLongitude="-68.322347" bottomLatitude="17.470139"></dr_map>
</defs>
<g>
<path id="DO-01" title="Distrito Nacional" class="land" d="M435.2,341.16l0.17,-2.24l2.37,-2.49l-0.13,-5.53l-1.32,-1.8l-5.01,-3.59l-1.32,-2.35l2.77,-1.52v-2.49l1.19,-2.76l1.72,-1.24l2.37,0.97l1.19,1.8l0.66,2.9l2.24,1.11l2.37,3.18l3.3,-1.38l7.65,-1.52l3.17,2.07v1.8l-1.85,3.46l0.3,2.37l0,0l-1.79,-0.25l-2.21,1.63l-4.24,1.57l-2.21,3.01l-2.45,1.25l-6.75,2.07H435.2z"/>
</g>
</svg>
result_df= df.head(n=5)
result_df
将是:
result_df
如果您只需要'Village','Workers'和'Level'列,请尝试使用:
Village Workers Level Sum_Level_wise Probability Sample Selected villages Selected village
7 Takali 127 Large 474 0.60 2.40 3 Takali
8 Gardhani 122 Large 474 0.60 2.40 3 Gardhani
9 Pi.Khand 120 Large 474 0.60 2.40 3 Pi.Khand
10 Pangri 105 Large 474 0.60 2.40 3
4 Dhokari 84 Medium 194 0.25 0.75 1 Dhokari
它会给你:
result_df[['Village','Workers','Level']]
<强>更新强>
Village Workers Level
7 Takali 127 Large
8 Gardhani 122 Large
9 Pi.Khand 120 Large
10 Pangri 105 Large
4 Dhokari 84 Medium
它会给出:
df['Selected village'].replace('', pd.np.nan, inplace=True)
df.dropna(subset=['Selected village'], inplace=True)
df[['Workers','Level','Selected village']]