在两个值之间进行选择,并在pandas数据帧中设置最频繁的值

时间:2017-09-14 11:10:33

标签: python pandas dataframe

我最近问过question,但现在我遇到了一个新问题。这是我的DataFrame:

df = pd.DataFrame({'id':[1,1,1,1,2,2,2,3,3,3,4,4],
              'sex': [0,0,0,1,0,0,0,1,1,0,1,1]})

    id  sex
0   1   0
1   1   0
2   1   0
3   1   1
4   2   0
5   2   0
6   2   0
7   3   1
8   3   1
9   3   0
10  4   1
11  4   1

现在我需要为具有混合性别值的id设置性别值。它应该是最常见的价值。所以我希望得到这样的东西:

    id  sex
0   1   0
1   1   0
2   1   0
3   1   0
4   2   0
5   2   0
6   2   0
7   3   1
8   3   1
9   3   1
10  4   1
11  4   1

之后我想只得到一个同性恋对:

id  sex
0   1   0
1   2   0
2   3   1
3   4   1

3 个答案:

答案 0 :(得分:1)

选项1
您可以使用value_counts,然后使用idxmaxdf = df.set_index('id').groupby(level=0).sex\ .apply(lambda x: x.value_counts().idxmax()).reset_index() df id sex 0 1 0 1 2 0 2 3 1 3 4 1

drop_duplicates

选项2
选项1 类似,但分为两步,使用df.sex = df.groupby('id').sex.transform(lambda x: x.value_counts().idxmax()) df id sex 0 1 0 1 1 0 2 1 0 3 1 0 4 2 0 5 2 0 6 2 0 7 3 1 8 3 1 9 3 1 10 4 1 11 4 1 df = df.drop_duplicates() df id sex 0 1 0 4 2 0 7 3 1 10 4 1

    <UserControl.Resources>

    <!--#region DataTemplateSelector-->
    <local:SettingsDataTemplateSelector x:Key="SettingsDataTemplateSelector" />

    <DataTemplate x:Key="TextboxDataTemplate">
        <xcdg:MaskedTextBox IsTabStop="True" Mask="{Binding EditMask}" Text="{Binding EditValue, IsAsync=False, Mode=TwoWay, UpdateSourceTrigger=LostFocus, ValidatesOnExceptions=True}"/>
    </DataTemplate>

    <DataTemplate x:Key="ComboDataTemplate">
        <ComboBox IsTabStop="True"  ItemsSource="{Binding Path=SelectionValues}"
                                    SelectedValuePath="Value"
                                    SelectedValue="{Binding Path=SelectionValue, Mode=TwoWay, UpdateSourceTrigger=PropertyChanged}"
                                    DisplayMemberPath="ValueText">
        </ComboBox>
    </DataTemplate>

    <DataTemplate x:Key="SliderDataTemplate">
        <Slider IsTabStop="True" Value="{Binding EditSliderValue, Mode=TwoWay, UpdateSourceTrigger=PropertyChanged}"
                    Minimum="{Binding MinRangeValue}" 
                    Maximum="{Binding MaxRangeValue}"
                    VerticalAlignment="Bottom" 
                    IsSnapToTickEnabled="True"
                    TickFrequency="1"
                    Margin="0,0,0,0"/>
    </DataTemplate>
    <!--#endregion-->

    <xcdg:DataGridCollectionViewSource x:Key="Features" 
                                        Source ="{Binding Path=Demo.Features}"
                                        AutoFilterMode="And"
                                        AutoCreateDetailDescriptions="False" 
                                        AutoCreateItemProperties="False">
        <xcdg:DataGridCollectionViewSource.DetailDescriptions>
            <xcdg:PropertyDetailDescription RelationName="Settings" AutoCreateDetailDescriptions="False" AutoCreateItemProperties="False"/>
        </xcdg:DataGridCollectionViewSource.DetailDescriptions>
    </xcdg:DataGridCollectionViewSource>
</UserControl.Resources>

<Grid>
    <!--#region Xceed DataGrid-->
    <xcdg:DataGridControl x:Name="datagrid"
                          ItemsSource="{Binding Source={StaticResource Features}}"
                          KeyUp="DatagridKeyUp"
                          AllowDetailToggle="True" 
                          Margin="10"
                          NavigationBehavior="RowOrCell" 
                          CellEditorDisplayConditions="RowIsBeingEdited, 
                          MouseOverCell, MouseOverRow, RowIsCurrent, CellIsCurrent" 
                          EditTriggers="BeginEditCommand, ClickOnCurrentCell, 
                          SingleClick, CellIsCurrent, ActivationGesture, RowIsCurrent"
                          ItemScrollingBehavior="Immediate"
                          AutoCreateColumns="False">

        <xcdg:DataGridControl.Resources>
            <Style TargetType="xcdg:TableViewScrollViewer">
                <Setter Property="HorizontalScrollBarVisibility" Value="Auto" />
                <Setter Property="VerticalScrollBarVisibility" Value="Auto" />
            </Style>
        </xcdg:DataGridControl.Resources>

        <xcdg:DataGridControl.View>
            <xcdg:TableflowView UseDefaultHeadersFooters="False" ColumnStretchMode="Last">
                <xcdg:TableflowView.FixedHeaders>
                    <DataTemplate>
                        <xcdg:ColumnManagerRow />
                    </DataTemplate>
                </xcdg:TableflowView.FixedHeaders>
            </xcdg:TableflowView>
        </xcdg:DataGridControl.View>

        <xcdg:DataGridControl.Columns>
            <xcdg:Column FieldName="FeatureID" Title="FeatureID" ReadOnly="True" />
            <xcdg:Column FieldName="Name" Title="Feature name" ReadOnly="True" />
            <xcdg:Column FieldName="Description" Title="Description" ReadOnly="True" />
            <xcdg:Column FieldName=" "/>
        </xcdg:DataGridControl.Columns>

        <xcdg:DataGridControl.DetailConfigurations>
            <xcdg:DetailConfiguration RelationName="Settings" Title="">
                <xcdg:DetailConfiguration.Columns>
                    <xcdg:Column FieldName="Name" Title="Name" ReadOnly="True"/>
                    <xcdg:Column FieldName="Description" Title="Description" ReadOnly="True"/>
                    <xcdg:Column FieldName="EditValues" Title="Edit Values" ReadOnly="True"/>
                    <xcdg:Column FieldName="EditValueVar" Title="Edit Value" Width="150" ReadOnly="False"
                                 CellContentTemplateSelector="{StaticResource SettingsDataTemplateSelector}"
                                 DisplayMemberBinding="{Binding}" />
                    <xcdg:Column FieldName=" "/>
                </xcdg:DetailConfiguration.Columns>
            </xcdg:DetailConfiguration>
        </xcdg:DataGridControl.DetailConfigurations>
    </xcdg:DataGridControl>
    <!--#endregion-->
</Grid>

答案 1 :(得分:1)

默认使用groupbyvalue_counts进行排序,因此[0]只需要选择第一个索引:

df = df.groupby('id')['sex'].apply(lambda x: x.value_counts().index[0]).reset_index()
print (df)
   id  sex
0   1    0
1   2    0
2   3    1
3   4    1

答案 2 :(得分:1)

您也可以使用np.bincount

In [179]: df.groupby('id')['sex'].apply(lambda x: np.argmax(np.bincount(x))).reset_index()
Out[179]:
   id  sex
0   1    0
1   2    0
2   3    1
3   4    1

计时

In [194]: df = pd.concat([df]*1000, ignore_index=True)

In [195]: df.shape
Out[195]: (12000, 2)

In [196]: %timeit df.groupby('id')['sex'].apply(lambda x: np.argmax(np.bincount(x))).reset_index()
100 loops, best of 3: 2.48 ms per loop

In [197]: %timeit df.groupby('id')['sex'].apply(lambda x: x.value_counts().index[0]).reset_index()
100 loops, best of 3: 4.55 ms per loop

In [198]: %timeit df.set_index('id').groupby(level=0).sex.apply(lambda x: x.value_counts().idxmax()).reset_index()
100 loops, best of 3: 6.71 ms per loop