我最近问过question,但现在我遇到了一个新问题。这是我的DataFrame:
df = pd.DataFrame({'id':[1,1,1,1,2,2,2,3,3,3,4,4],
'sex': [0,0,0,1,0,0,0,1,1,0,1,1]})
id sex
0 1 0
1 1 0
2 1 0
3 1 1
4 2 0
5 2 0
6 2 0
7 3 1
8 3 1
9 3 0
10 4 1
11 4 1
现在我需要为具有混合性别值的id设置性别值。它应该是最常见的价值。所以我希望得到这样的东西:
id sex
0 1 0
1 1 0
2 1 0
3 1 0
4 2 0
5 2 0
6 2 0
7 3 1
8 3 1
9 3 1
10 4 1
11 4 1
之后我想只得到一个同性恋对:
id sex
0 1 0
1 2 0
2 3 1
3 4 1
答案 0 :(得分:1)
选项1
您可以使用value_counts
,然后使用idxmax
和df = df.set_index('id').groupby(level=0).sex\
.apply(lambda x: x.value_counts().idxmax()).reset_index()
df
id sex
0 1 0
1 2 0
2 3 1
3 4 1
。
drop_duplicates
选项2
与选项1 类似,但分为两步,使用df.sex = df.groupby('id').sex.transform(lambda x: x.value_counts().idxmax())
df
id sex
0 1 0
1 1 0
2 1 0
3 1 0
4 2 0
5 2 0
6 2 0
7 3 1
8 3 1
9 3 1
10 4 1
11 4 1
df = df.drop_duplicates()
df
id sex
0 1 0
4 2 0
7 3 1
10 4 1
<UserControl.Resources>
<!--#region DataTemplateSelector-->
<local:SettingsDataTemplateSelector x:Key="SettingsDataTemplateSelector" />
<DataTemplate x:Key="TextboxDataTemplate">
<xcdg:MaskedTextBox IsTabStop="True" Mask="{Binding EditMask}" Text="{Binding EditValue, IsAsync=False, Mode=TwoWay, UpdateSourceTrigger=LostFocus, ValidatesOnExceptions=True}"/>
</DataTemplate>
<DataTemplate x:Key="ComboDataTemplate">
<ComboBox IsTabStop="True" ItemsSource="{Binding Path=SelectionValues}"
SelectedValuePath="Value"
SelectedValue="{Binding Path=SelectionValue, Mode=TwoWay, UpdateSourceTrigger=PropertyChanged}"
DisplayMemberPath="ValueText">
</ComboBox>
</DataTemplate>
<DataTemplate x:Key="SliderDataTemplate">
<Slider IsTabStop="True" Value="{Binding EditSliderValue, Mode=TwoWay, UpdateSourceTrigger=PropertyChanged}"
Minimum="{Binding MinRangeValue}"
Maximum="{Binding MaxRangeValue}"
VerticalAlignment="Bottom"
IsSnapToTickEnabled="True"
TickFrequency="1"
Margin="0,0,0,0"/>
</DataTemplate>
<!--#endregion-->
<xcdg:DataGridCollectionViewSource x:Key="Features"
Source ="{Binding Path=Demo.Features}"
AutoFilterMode="And"
AutoCreateDetailDescriptions="False"
AutoCreateItemProperties="False">
<xcdg:DataGridCollectionViewSource.DetailDescriptions>
<xcdg:PropertyDetailDescription RelationName="Settings" AutoCreateDetailDescriptions="False" AutoCreateItemProperties="False"/>
</xcdg:DataGridCollectionViewSource.DetailDescriptions>
</xcdg:DataGridCollectionViewSource>
</UserControl.Resources>
<Grid>
<!--#region Xceed DataGrid-->
<xcdg:DataGridControl x:Name="datagrid"
ItemsSource="{Binding Source={StaticResource Features}}"
KeyUp="DatagridKeyUp"
AllowDetailToggle="True"
Margin="10"
NavigationBehavior="RowOrCell"
CellEditorDisplayConditions="RowIsBeingEdited,
MouseOverCell, MouseOverRow, RowIsCurrent, CellIsCurrent"
EditTriggers="BeginEditCommand, ClickOnCurrentCell,
SingleClick, CellIsCurrent, ActivationGesture, RowIsCurrent"
ItemScrollingBehavior="Immediate"
AutoCreateColumns="False">
<xcdg:DataGridControl.Resources>
<Style TargetType="xcdg:TableViewScrollViewer">
<Setter Property="HorizontalScrollBarVisibility" Value="Auto" />
<Setter Property="VerticalScrollBarVisibility" Value="Auto" />
</Style>
</xcdg:DataGridControl.Resources>
<xcdg:DataGridControl.View>
<xcdg:TableflowView UseDefaultHeadersFooters="False" ColumnStretchMode="Last">
<xcdg:TableflowView.FixedHeaders>
<DataTemplate>
<xcdg:ColumnManagerRow />
</DataTemplate>
</xcdg:TableflowView.FixedHeaders>
</xcdg:TableflowView>
</xcdg:DataGridControl.View>
<xcdg:DataGridControl.Columns>
<xcdg:Column FieldName="FeatureID" Title="FeatureID" ReadOnly="True" />
<xcdg:Column FieldName="Name" Title="Feature name" ReadOnly="True" />
<xcdg:Column FieldName="Description" Title="Description" ReadOnly="True" />
<xcdg:Column FieldName=" "/>
</xcdg:DataGridControl.Columns>
<xcdg:DataGridControl.DetailConfigurations>
<xcdg:DetailConfiguration RelationName="Settings" Title="">
<xcdg:DetailConfiguration.Columns>
<xcdg:Column FieldName="Name" Title="Name" ReadOnly="True"/>
<xcdg:Column FieldName="Description" Title="Description" ReadOnly="True"/>
<xcdg:Column FieldName="EditValues" Title="Edit Values" ReadOnly="True"/>
<xcdg:Column FieldName="EditValueVar" Title="Edit Value" Width="150" ReadOnly="False"
CellContentTemplateSelector="{StaticResource SettingsDataTemplateSelector}"
DisplayMemberBinding="{Binding}" />
<xcdg:Column FieldName=" "/>
</xcdg:DetailConfiguration.Columns>
</xcdg:DetailConfiguration>
</xcdg:DataGridControl.DetailConfigurations>
</xcdg:DataGridControl>
<!--#endregion-->
</Grid>
答案 1 :(得分:1)
默认使用groupby
和value_counts
进行排序,因此[0]
只需要选择第一个索引:
df = df.groupby('id')['sex'].apply(lambda x: x.value_counts().index[0]).reset_index()
print (df)
id sex
0 1 0
1 2 0
2 3 1
3 4 1
答案 2 :(得分:1)
您也可以使用np.bincount
。
In [179]: df.groupby('id')['sex'].apply(lambda x: np.argmax(np.bincount(x))).reset_index()
Out[179]:
id sex
0 1 0
1 2 0
2 3 1
3 4 1
计时
In [194]: df = pd.concat([df]*1000, ignore_index=True)
In [195]: df.shape
Out[195]: (12000, 2)
In [196]: %timeit df.groupby('id')['sex'].apply(lambda x: np.argmax(np.bincount(x))).reset_index()
100 loops, best of 3: 2.48 ms per loop
In [197]: %timeit df.groupby('id')['sex'].apply(lambda x: x.value_counts().index[0]).reset_index()
100 loops, best of 3: 4.55 ms per loop
In [198]: %timeit df.set_index('id').groupby(level=0).sex.apply(lambda x: x.value_counts().idxmax()).reset_index()
100 loops, best of 3: 6.71 ms per loop