Question

例如，我有矢量

x=c(-1,-1,-1,-1,1,1,1,-1,-1,1,1,-1,-1,1,1,-1,1,-1,1).

以下列表：

Y =
list(1:7, 8:11, 12:15, 16:19)

如何根据列表x对向量Y进行排序？我的意思是排序前7个元素，接下来的4个元素，接下来的4个元素和最后4个元素。

应该是所需的输出 c(-1,-1,-1,-1,1,1,1,-1,-1,1,1,-1,-1,1,1,-1,-1,1,1)。

请注意，列表Y并不总是相同。

我尝试使用x[unlist(sapply(Y, sort))]，但它不起作用。

你有什么选择吗？

Answer 1

unlist(lapply(lapply(Y, function(i) x[i]), sort))
# [1] -1 -1 -1 -1  1  1  1 -1 -1  1  1 -1 -1  1  1 -1 -1  1  1

这首先根据x中的索引将Y的元素提取到不同的列表项中，

lapply(Y, function(i) x[i])

然后用lapply(..., sort)独立地对每一个进行排序，然后将它们重新组合成一个带有unlist的向量。

使用此输入：

x = c(-1,-1,-1,-1,1,1,1,-1,-1,1,1,-1,-1,1,1,-1,1,-1,1)
Y = list(1:7, 8:11, 12:15, 16:19)

Answer 2

不确定这是否要求你

/** Returns the version of the <a href="https://www.unicode.org/versions/enumeratedversions.html">
 *  Unicode standard</a> supported by the {@link java.lang.Character} class. The return value 
 *  represents the major and minor version numbers (in low-order octets 1 and 0, respectively). 
 *  The "update" version number is not represented, because this method cannot distinguish between 
 *  update versions, such as 6.2.0 and 6.2.1.
 *  <p>
 *  This version number be can converted to a representation suitable for human interfaces 
 *  with code such as <code>(version >> 8) + "." + (version & 0xFF)</code> or 
 *  <code>System.out.printf("Unicode version: %d.%d%n", version >> 8, version & 0xFF)</code>.
 *  <p>
 *  This method is compatible with Java versions >= 1.1.
 *  
 * @return The Unicode version number, with the major version number in low-order octet 1 and the 
 *         minor version number in low-order octet 0.
 */
public static int getUnicodeVersion() {

/*  Version determination here is based on testing support for the first 
    new code point added by each version of Unicode. (See 
    <https://www.unicode.org/Public/UCD/latest/ucd/DerivedAge.txt>.)

    With regard to omission of the update version number, the document
    "Unicode Standard Annex #44, Unicode Character Database" revision 21
    (from Unicode 11.0.0 draft 1) states:

    "Formally, the Age property is a catalog property whose enumerated 
    values correspond to a list of tuples consisting of a major version 
    integer and a minor version integer. The major version is a positive 
    integer constrained to the range 1..255. The minor version is a non-
    negative integer constrained to the range 0..255. These range limit-
    ations are specified so that implementations can be guaranteed that 
    all valid, assigned Age values can be represented in a sequence of two 
    unsigned bytes. A third value corresponding to the Unicode update 
    version is not required, because new characters are never assigned in 
    update versions of the standard."

    See <https://www.unicode.org/reports/tr44/tr44-21.html#Character_Age>
    for further details.
*/  

//  Preliminary Unicode 11 data obtained from <https://www.unicode.org/Public/11.0.0/ucd/DerivedAge-11.0.0d13.txt>.

    if (Character.getType('\u0560') != Character.UNASSIGNED)
        return 0xB00;    //  11.0, June 2018.

    if (Character.getType('\u0860') != Character.UNASSIGNED)
        return 0xA00;    //  10.0, June 2017.

    if (Character.getType('\u08b6') != Character.UNASSIGNED)
        return 0x900;     //  9.0, June 2016.

    if (Character.getType('\u08b3') != Character.UNASSIGNED)
        return 0x800;     //  8.0, June 2015.

    if (Character.getType('\u037f') != Character.UNASSIGNED)
        return 0x700;     //  7.0, June 2014.

    if (Character.getType('\u061c') != Character.UNASSIGNED)
        return 0x603;     //  6.3, September 2013.

    if (Character.getType('\u20ba') != Character.UNASSIGNED)
        return 0x602;     //  6.2, September 2012.

    if (Character.getType('\u058f') != Character.UNASSIGNED)
        return 0x601;     //  6.1, January 2012.

    if (Character.getType('\u0526') != Character.UNASSIGNED)
        return 0x600;     //  6.0, October 2010.

    if (Character.getType('\u0524') != Character.UNASSIGNED)
        return 0x502;     //  5.2, October 2009.

    if (Character.getType('\u0370') != Character.UNASSIGNED)
        return 0x501;     //  5.1, March 2008.

    if (Character.getType('\u0242') != Character.UNASSIGNED)
        return 0x500;     //  5.0, July 2006.

    if (Character.getType('\u0237') != Character.UNASSIGNED)
        return 0x401;     //  4.1, March 2005.

    if (Character.getType('\u0221') != Character.UNASSIGNED)
        return 0x400;     //  4.0, April 2003.

    if (Character.getType('\u0220') != Character.UNASSIGNED)
        return 0x302;     //  3.2, March 2002.

    if (Character.getType('\u03f4') != Character.UNASSIGNED)
        return 0x301;     //  3.1, March 2001.

    if (Character.getType('\u01f6') != Character.UNASSIGNED)
        return 0x300;     //  3.0, September 1999.

    if (Character.getType('\u20ac') != Character.UNASSIGNED)
        return 0x201;     //  2.1, May 1998.

    if (Character.getType('\u0591') != Character.UNASSIGNED)
        return 0x200;     //  2.0, July 1996.

    if (Character.getType('\u0000') != Character.UNASSIGNED)
        return 0x101;     //  1.1, June 1993.

    return 0x100;         //  1.0
}

输入

> unlist(lapply(y, function(z) sort(x[z])))
 [1] -1 -1 -1 -1  1  1  1 -1 -1  1  1 -1 -1  1  1 -1 -1  1  1

Answer 3

你也可以同时使用x和Y来避免循环和矢量化顺序（因为order允许在关系情况下按两个向量排序）

x[order(rep(seq_along(Y), lengths(Y)), x)]
# [1] -1 -1 -1 -1  1  1  1 -1 -1  1  1 -1 -1  1  1 -1 -1  1  1

插图的一些基准

set.seed(123)
N <- 1e5
x <- sample(N) 
Y <- split(1:N, rep(1 : (N/5), each = 5))


microbenchmark::microbenchmark("Gregor" = unlist(lapply(lapply(Y, function(i) x[i]), sort)),
                               "Frank1" = ave(x, stack(setNames(Y, seq_along(Y)))$ind, FUN = sort),
                               "Frank2" = x[order(stack(setNames(Y, seq_along(Y)))$ind, x)],
                               "Jilber" =  unlist(lapply(Y, function(z) sort(x[z]))),
                               "David" = x[order(rep(seq_along(Y), lengths(Y)), x)])


# Unit: milliseconds
#   expr        min         lq       mean     median         uq         max neval cld
# Gregor 904.277546 937.652137 958.911977 949.311012 961.324917 1164.024555   100   c
# Frank1 884.306262 922.496558 956.408754 941.394433 962.098976 1140.254656   100   c
# Frank2  27.384839  28.587845  30.806481  29.542219  31.684239   46.947814   100  b 
# Jilber 923.135901 949.318532 967.981792 962.090176 976.574574 1137.115863   100   c
# David   2.184901   2.326817   2.622338   2.492732   2.524091    8.586322   100 a

矢量化解决方案~X500比循环更快

如何基于R中的列表对矢量进行排序

3 个答案: