使用标识符列对数组行进行排序以匹配另一个数组的顺序

时间:2016-06-16 10:24:38

标签: python performance numpy

我有两个这样的数组:

<DataTemplate x:Key="ItemMainTemplate"  >
    <Grid Width="147" Height="258"  Background="Blue" x:Name="Item">
        <TextBlock Text="{Binding ItemCount}" Foreground="White" FontSize="20" VerticalAlignment="Center" HorizontalAlignment="Center"/>
    </Grid>
</DataTemplate>

<Style TargetType="{x:Type controls:ListViewExpand}">

    <Setter Property="Background" Value="Transparent"/>
    <Setter Property="BorderBrush" Value="Transparent"/>
    <Setter Property="BorderThickness" Value="0"/>
    <Setter Property="Foreground" Value="{DynamicResource {x:Static SystemColors.ControlTextBrushKey}}"/>
    <Setter Property="ScrollViewer.HorizontalScrollBarVisibility" Value="Disabled"/>
    <Setter Property="ScrollViewer.VerticalScrollBarVisibility" Value="Auto"/>
    <Setter Property="ScrollViewer.CanContentScroll" Value="True"/>
    <!--<Setter Property="VerticalContentAlignment" Value="Center"/>
        <Setter Property="HorizontalContentAlignment" Value="Stretch"/>-->
    <Setter Property="Template">
        <Setter.Value>
            <ControlTemplate TargetType="{x:Type controls:ListViewExpand}">
                <Border x:Name="Bd" BorderBrush="{TemplateBinding BorderBrush}" BorderThickness="{TemplateBinding BorderThickness}" Background="{TemplateBinding Background}" SnapsToDevicePixels="true">
                    <ScrollViewer Focusable="False">
                        <ItemsPresenter SnapsToDevicePixels="{TemplateBinding SnapsToDevicePixels}" Margin="{TemplateBinding Padding}"/>
                    </ScrollViewer>
                </Border>
                <ControlTemplate.Triggers>
                    <MultiTrigger>
                        <MultiTrigger.Conditions>
                            <Condition Property="IsGrouping" Value="true"/>
                            <Condition Property="VirtualizingPanel.IsVirtualizingWhenGrouping" Value="false"/>
                        </MultiTrigger.Conditions>
                        <Setter Property="ScrollViewer.CanContentScroll" Value="false"/>
                    </MultiTrigger>
                </ControlTemplate.Triggers>
            </ControlTemplate>
        </Setter.Value>
    </Setter>
</Style>
<Style  TargetType="{x:Type controls:ListViewExpandItem}">
    <!--<Setter Property="Padding" Value="10"/>
        <Setter Property="Margin" Value="10"/>-->
    <!--<Setter Property="ContentTemplate" Value="{StaticResource ItemTemplate}" />-->
    <Setter Property="HorizontalContentAlignment" Value="Stretch"/>
    <Setter Property="Template">
        <Setter.Value>
            <ControlTemplate TargetType="{x:Type controls:ListViewExpandItem}">

                <Border x:Name="Item"
                        Padding="3" 
                        BorderBrush="{TemplateBinding BorderBrush}"
                        BorderThickness="{TemplateBinding BorderThickness}">



                    <VirtualizingStackPanel IsVirtualizing="True" Orientation="Vertical">
                        <ContentControl x:Name="PART_Main" 
                                            Content="{Binding}" 
                                            ContentTemplate="{TemplateBinding MainTemplate}"
                                />
                        <ContentControl Panel.ZIndex="100"  Visibility="Collapsed" Name="PART_Detail" Content="{Binding}"  ContentTemplate="{TemplateBinding DetailTemplate}" />
                    </VirtualizingStackPanel>

                </Border>

                <ControlTemplate.Triggers>
                    <Trigger Property="IsSelected" Value="true">
                        <Setter TargetName="PART_Detail" Property="Visibility" Value="Visible"  />
                    </Trigger>

                </ControlTemplate.Triggers>
            </ControlTemplate>
        </Setter.Value>
    </Setter>

</Style>

其中第一列包含标识符,其余列包含一些数据,其中B的列数远大于A的列数。标识符是唯一的。 A中的行数可以小于B中的行数,因此在某些情况下需要空的间隔行 我正在寻找一种有效的方法来匹配矩阵A的行到矩阵B,以便结果看起来像这样:

A = [[111, ...],          B = [[222, ...],
     [222, ...],               [111, ...],
     [333, ...],               [333, ...],
     [555, ...]]               [444, ...],
                               [555, ...]]

我可以对两个矩阵进行排序或编写for循环,但这两种方法看起来都很笨拙......是否有更好的实现?

4 个答案:

答案 0 :(得分:2)

这是使用np.searchsorted -

的矢量化方法
# Store the sorted indices of A
sidx = A[:,0].argsort()

# Find the indices of col-0 of B in col-0 of sorted A
l_idx = np.searchsorted(A[:,0],B[:,0],sorter = sidx)

# Create a mask corresponding to all those indices that indicates which indices
# corresponding to B's col-0 match up with A's col-0
valid_mask = l_idx != np.searchsorted(A[:,0],B[:,0],sorter = sidx,side='right')

# Initialize output array with NaNs. 
# Use l_idx to set rows from A into output array. Use valid_mask to select 
# indices from l_idx and output rows that are to be set.
out = np.full((B.shape[0],A.shape[1]),np.nan)
out[valid_mask] = A[sidx[l_idx[valid_mask]]]

请注意,valid_mask也可以使用np.in1dnp.in1d(B[:,0],A[:,0])创建,以获得更直观的答案。但是,我们正在使用np.searchsorted,因为它在性能方面更好,在this other solution中也有更详细的讨论。

示例运行 -

In [184]: A
Out[184]: 
array([[45, 11, 86],
       [18, 74, 59],
       [30, 68, 13],
       [55, 47, 78]])

In [185]: B
Out[185]: 
array([[45, 11, 88],
       [55, 83, 46],
       [95, 87, 77],
       [30,  9, 37],
       [14, 97, 98],
       [18, 48, 53]])

In [186]: out
Out[186]: 
array([[ 45.,  11.,  86.],
       [ 55.,  47.,  78.],
       [ nan,  nan,  nan],
       [ 30.,  68.,  13.],
       [ nan,  nan,  nan],
       [ 18.,  74.,  59.]])

答案 1 :(得分:0)

简单方法是从dict构建A,然后使用它将B中找到的标识符映射到新数组。

构建dict

>>> A = [[1,"a"], [2,"b"], [3,"c"]]
>>> A_dict = {x[0]: x for x in A}
>>> A_dict
{1: [1, 'a'], 2: [2, 'b'], 3: [3, 'c']}

映射:

>>> B = [[3,"..."], [2,"..."], [1,"..."]]
>>> result = (A_dict[x[0]] for x in B)
>>> list(result)
[[3, 'c'], [2, 'b'], [1, 'a']]

答案 2 :(得分:0)

如果您希望将B中的值连接到A,则不清楚。让我们假设不...那么最简单的方法可能只是建立一个标识符的字典行,然后重新排序A

def match_order(A, B):
    # identifier -> row
    by_id = {A[i, 0]: A[i] for i in range(len(A))}

    # make up a fill row and rearrange according to B
    fill_row = [-1] * A.shape[1]
    return numpy.array([by_id.get(k, fill_row) for k in B[:, 0]])

例如,如果我们有:

A = numpy.array([[111, 1], [222, 2], [333, 3], [555, 5]])
B = numpy.array([[222, 2], [111, 1], [333, 3], [444, 4], [555, 5]])

然后

>>> match_order(A, B)
array([[222,   2],
       [111,   1],
       [333,   3],
       [ -1,  -1],
       [555,   5]])

如果您希望连接B,那么您可以这样做:

>>> numpy.hstack( (match_order(A, B), B[:, 1:]) )
array([[222,   2,   2],
       [111,   1,   1],
       [333,   3,   3],
       [ -1,  -1,   4],
       [555,   5,   5]])

答案 3 :(得分:0)

>>> A = [[3,'d', 'e', 'f'], [1,'a','b','c'], [2,'n','n','n']]
>>> B = [[1,'a','b','c'], [3,'d','e','f']]
>>> A_dict = {x[0]:x[1:] for x in A}
>>> A_dict
    {1: ['a', 'b', 'c'], 2: ['n', 'n', 'n'], 3: ['d', 'e', 'f']}
>>> B_dict = {x[0]:x[1:] for x in B}
>>> B_dict
    {1: ['a', 'b', 'c'], 3: ['d', 'e', 'f']} 
>>> result=[[x] + A_dict[x] for x in A_dict if x in B_dict and A_dict[x]==B_dict[x]]
>>> result
    [[1, 'a', 'b', 'c'], [3, 'd', 'e', 'f']]

这里A [0],B [1]和A [1],B [0]是相同的。转换成字典并处理问题可以使这里的生活变得更轻松。

步骤1:为每个2D列表创建dict对象。

步骤2:迭代A_dict中的每个键并检查:         一个。如果Key存在于B_dict中,         湾如果是,请查看两个键是否具有相同的值

步骤3:附加键和值以形成二维列表。

干杯!