使用Pandas如何使用下一步分析来获取数据

时间:2017-06-03 13:01:00

标签: python pandas numpy

我有数据:

Village     Workers       Level
Aagar       10            Small
Dhagewadi   32            Small
Sherewadi   34            Small
Shindwad    42            Small
Dhokari     84            Medium
Khanapur    65            Medium
Ambikanagar 45            Medium
Takali      127           Large
Gardhani    122           Large
Pi.Khand    120           Large
Pangri      105           Large

代码:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df=pd.read_csv("/home/desktop/Desktop/t.csv")
df = df.sort('Workers', ascending=False)           
df['Level'] = pd.qcut(df['Workers'], 3, ['Small','Medium','Large'])
df['Sum_Level_wise'] = df.groupby('Level')['Workers'].transform('sum')
df['Probability'] = df['Sum_Level_wise'].div(df['Workers'].sum()).round(2)
df['Sample'] = df['Probability'] * df.groupby('Level')['Workers'].transform('size')
df['Selected villages'] = df['Sample'].apply(np.ceil).astype(int)


def f(x):
    a = x['Village'].head(x['Selected villages'].iat[0])
    print (x['Village'])
    print (a)
    if (len(x) < len(a)):
        print ('original village cannot be filled to Selected village, because length is higher')
    return a

df['Selected village'] = df.groupby('Level').apply(f).reset_index(level=0)['Village']
df['Selected village'] = df['Selected village'].fillna('')

print (df)

接下来,我已经获得了在抽样中选择的村庄

enter image description here

所以,我只想选择相应的工人详细信息和Level列的村名。

像这样:( Excel照片)

enter image description here

所以,我只想要那个村名,因为我不想展示每个步骤。

只是使用5个村庄的抽样,这些数据会显示,有什么帮助吗?

1 个答案:

答案 0 :(得分:0)

您似乎需要head

<?xml version="1.0" encoding="utf-8"?>
<svg xmlns="http://www.w3.org/2000/svg" xmlns:amcharts="http://amcharts.com/ammap" xmlns:xlink="http://www.w3.org/1999/xlink" version="1.1">
    <defs>
        <style type="text/css">
            .land
            {
            fill: #CCCCCC;
            fill-opacity: 1;
            stroke:white;
            stroke-opacity: 1;
            stroke-width:1;
            display:block;
            }
            .land:hover
            {
            fill: blue;
            cursor: pointer;
            }
        </style>

        <dr_map projection="mercator" leftLongitude="-72.004173" topLatitude="19.932499" rightLongitude="-68.322347" bottomLatitude="17.470139"></dr_map>

    </defs>
    <g>
        <path id="DO-01" title="Distrito Nacional" class="land" d="M435.2,341.16l0.17,-2.24l2.37,-2.49l-0.13,-5.53l-1.32,-1.8l-5.01,-3.59l-1.32,-2.35l2.77,-1.52v-2.49l1.19,-2.76l1.72,-1.24l2.37,0.97l1.19,1.8l0.66,2.9l2.24,1.11l2.37,3.18l3.3,-1.38l7.65,-1.52l3.17,2.07v1.8l-1.85,3.46l0.3,2.37l0,0l-1.79,-0.25l-2.21,1.63l-4.24,1.57l-2.21,3.01l-2.45,1.25l-6.75,2.07H435.2z"/>
    </g>
</svg>

result_df= df.head(n=5) result_df 将是:

result_df

如果您只需要'Village','Workers'和'Level'列,请尝试使用:

    Village   Workers Level Sum_Level_wise Probability Sample Selected villages Selected village
7   Takali    127     Large  474           0.60        2.40   3                 Takali
8   Gardhani  122     Large  474           0.60        2.40   3                 Gardhani
9   Pi.Khand  120     Large  474           0.60        2.40   3                 Pi.Khand
10  Pangri    105     Large  474           0.60        2.40   3 
4   Dhokari   84      Medium 194           0.25        0.75   1                 Dhokari

它会给你:

result_df[['Village','Workers','Level']]

<强>更新

    Village     Workers Level
7   Takali      127     Large
8   Gardhani    122     Large
9   Pi.Khand    120     Large
10  Pangri      105     Large
4   Dhokari     84      Medium

它会给出:

df['Selected village'].replace('', pd.np.nan, inplace=True)
df.dropna(subset=['Selected village'], inplace=True)
df[['Workers','Level','Selected village']]