
时间:2011-04-20 00:51:06

标签: r swap


"dam"   "piglet"   "fdate"   "ssire"

"piglet"   "ssire"   "dam"   "tdate"




7 个答案:

答案 0 :(得分:31)

dfrm <- dfrm[c("piglet", "ssire", "dam", "tdate")]


dfrm <- dfrm[ , c("piglet", "ssire", "dam", "tdate")]

答案 1 :(得分:13)

d <- data.frame(a=1:3, b=11:13, c=21:23)
#  a  b  c
#1 1 11 21
#2 2 12 22
#3 3 13 23
d2 <- d[,c("b", "c", "a")]
#   b  c a
#1 11 21 1
#2 12 22 2
#3 13 23 3


d3 <- d[,c(2, 3, 1)]
#   b  c a
#1 11 21 1
#2 12 22 2
#3 13 23 3

答案 2 :(得分:8)



dfr <- data.frame(
  dam    = 1:5,
  piglet = runif(5),
  fdate  = letters[1:5],
  ssire  = rnorm(5)


dfr[, c(2, 4, 1, 3)]


dfr[, c("piglet", "ssire", "dam", "fdate")]

DWin&amp; Gavin的回答:数据框允许您在指定索引时省略行参数。

dfr[c(2, 4, 1, 3)]
dfr[c("piglet", "ssire", "dam", "fdate")]


subset(dfr, select = c(2, 4, 1, 3))
subset(dfr, select = c(c("piglet", "ssire", "dam", "fdate")))

答案 3 :(得分:6)


#Assume df contains "dam" "piglet" "fdate" "ssire"

newdf<-subset(df, select=c("piglet", "ssire", "dam", "tdate"))

答案 4 :(得分:2)


# Install and load the dplyr package

# Override the existing data frame with the desired column order
df <- select(df, piglet, ssire, dam, tdate)


  1. 由于select()不需要将变量名括在引号中,因此您将需要键入更少的内容。
  2. 如果您的数据框具有四个以上的变量,则可以利用选择帮助程序功能(例如starts_with()ends_with()等)来选择多个列,而不必命名每个列并用它们重新排列很轻松。

答案 5 :(得分:1)


TL; DR:此处提供了一种用于数字索引的衬里,并且该函数最后用于恰好交换2个名义和数字索引,而没有使用导入,该函数将正确交换数据帧中的任何两列提供任何大小。还提供了一个功能,该功能允许重新分配任意数量的列(如果使用不当会导致不必要的不​​必要的交换)(请参阅摘要部分中的更多信息并获取功能)


假设您有一个巨大的(或没有)数据帧DF,并且您只知道要交换的两列的索引,例如1 < n < m < length(DF)。 (同样重要的是,您的列不相邻,即|n-m| > 1在我们的“巨大”数据框中很有可能是这种情况,但对于较小的数据帧则不一定如此;所有简并情况的解决方法都在结束)。 因为它很大,所以有很多列,您不想手动指定其他任何列,或者它不是很大,而您只是 lazy 品味高尚的人在编码中,无论哪种方式,这种单线都可以做到:

    DF <- DF[ c( 1:(n-1), m, (n+1):(m-1), n, (m+1):length(DF) ) ]


    1:(n-1)           # This keeps every column before column `n` in place

    m                 # This places column `m` where column `n` was

    (n+1):(m-1)       # This keeps every column between the two in place

    n                 # This places column `n` where column `m` was

    (m+1):length(DF)  # This keeps every column after column `m` in place



    > 10:0
      [1] 10  9  8  7  6  5  4  3  2  1  0

我们必须注意nm的选择和位置,因此要遵守我们以前的限制。例如,n < m不会给我们带来任何笼统性(如果其中一列不同,则其中一列必须在另一列之前),但这意味着我们必须注意行中哪一行代码。我们可以做到,这样我们就不必通过以下修改来检查这种情况:

    DF <- DF[ c( 1:(min(n,m)-1), max(n,m), (min(n,m)+1):(max(n,m)-1), min(n,m), (max(n,m)+1):length(DF) ) ]

我们分别将nm的每个实例替换为min(n,m)max(n,m),这意味着即使在以下情况下,也将保留我们代码的正确顺序m > n

min(n,m) == 1max(n,m) == length(DF)(两者同时出现)和|n-m| == 1的情况下,我们将进行不可读较少的美学修饰涉及if\else,而不必检查是否是这种情况。您知道其中一种情况的版本(即,您总是将一些内部列与第一列交换,将一些内部列与最后一列交换,交换第一列和最后一列,或交换两个相邻列),您实际上可以更简洁地表达这些动作,因为它们通常只需要从我们的受限情况中省略部分内容即可:

    # Swapping not the last column with the first column
    # We just got rid of 1:(min(n,m)-1) because it would be invalid and not what we meant
    # since min(n,m) == 1
    # Now we just stick the other column right at the front
    DF <- DF[ c( max(n,m), (min(n,m)+1):(max(n,m)-1), min(n,m), (max(n,m)+1):length(DF) ) ]
    # Also equivalent since we know min(n,m) == 1, for the leftover index i
    DF <- DF[ c( i, 2:(i-1), 1, (i+1):length(DF) ) ]

    # Swapping not the first column with the last column
    # Similarly, we just got rid of (max(n,m)+1):length(DF) because it would also be invalid 
    # and not what we meant since max(n,m) == length(DF)
    # Now we just stick the other column right at the end
    DF <- DF[ c( 1:(min(n,m)-1), max(n,m), (min(n,m)+1):(max(n,m)-1), min(n,m) ) ]
    # Also equivalent since we know max(n,m) == length(DF), for the leftover index, say i
    DF <- DF[ c( 1:(i-1), length(DF), (i+1):(length(DF)-1), i ) ]

    # Swapping the first column with the last column
    DF <- DF[ c( max(n,m), (min(n,m)+1):(max(n,m)-1), min(n,m) ) ]
    # Also equivalent (for if you don't actually know the length beforehand, as assumed 
    # elsewhere)
    DF <- DF[ c( length(DF), 2:(length(DF)-1), 1 ) ]

    # Swapping two interior adjacent columns
    # Here we drop the explicit swap on either side of our middle column segment
    # This is actually enough because then the middle segment becomes a backwards range
    # because we know that `min(n,m) + 1 = max(n,m)`
    # The range is just an ordering of the two adjacent indices from largest to smallest
    DF <- DF[ c( 1:(min(n,m)-1), (min(n,m)+1):(max(n,m)-1), (max(n,m)+1):length(DF) )]




    DF <- DF[ if (n==m) 1:length(DF) else c( (if (min(n,m)==1) c() else 1:(min(n,m)-1) ), (if (min(n,m)+1 == max(n,m)) (min(n,m)+1):(max(n,m)-1) else c( max(n,m), (min(n,m)+1):(max(n,m)-1), min(n,m))), (if (max(n,m)==length(DF)) c() else (max(n,m)+1):length(DF) ) ) ]


# A function that swaps the `n` column and `m` column in the data frame DF
swap <- function(DF, n, m)
  return (DF[ if (n==m) 1:length(DF) else c( (if (min(n,m)==1) c() else 1:(min(n,m)-1) ), (if (min(n,m)+1 == max(n,m)) (min(n,m)+1):(max(n,m)-1) else c( max(n,m), (min(n,m)+1):(max(n,m)-1), min(n,m))), (if (max(n,m)==length(DF)) c() else (max(n,m)+1):length(DF) ) ) ])


# Returns data frame object with columns `n` and `m` swapped
# `n` and `m` can be column names, numerical indices, or a heterogeneous pair of both
swap <- function(DF, n, m)

  # Of course, first, we want to make sure that n != m,
  # because if they do, we don't need to do anything
  if (n==m) return(DF)

  # Next, if either n or m is a column name, we want to get its index
  # We assume that if they aren't column names, they are indices (integers)
    n <- if (class(n)=="character" & is.na(suppressWarnings(as.integer(n)))) which(colnames(DF)==n) else as.integer(n)
  m <- if (class(m)=="character" & is.na(supressWarnings(as.integer(m)))) which(colnames(DF)==m) else as.integer(m)
  # Make sure each index is actually valid
  if (!(1<=n & n<=length(DF))) stop( "`n` represents invalid index!" )
  if (!(1<=m & m<=length(DF))) stop( "`m` represents invalid index!" )

  # Also, for readability, lets go ahead and set which column is earlier, and which is later
  earlier <- min(n,m)
  later <- max(n,m)

  # This constructs the first third of the indices 
  # These are the columns that, if any, come before the earlier column you are swapping
  firstThird <- if ( earlier==1 ) c() else 1:(earlier-1)

  # This constructs the last third of the the indices
  # These are the columns, if any, that come after the later column you are swapping
  lastThird <- if ( later==length(DF) ) c() else (later+1):length(DF) 

  # This checks if the columns to be swapped are adjacent and then constructs the 
  # secondThird accordingly
  if ( earlier+1 == later )
    # Here; the second third is a list of the two columns ordered from later to earlier
    secondThird <- (earlier+1):(later-1)
    # Here; the second third is a list of 
    # the later column you want to swap
    # the columns in between
    # and then the earlier column you want to swap
    secondThird <- c( later, (earlier+1):(later-1), earlier)

  # Now we assemble our indices and return our permutation of DF
  return (DF[ c( firstThird, secondThird, lastThird ) ])


 swap <- function(DF, n, m)
  n <- if (class(n)=="character" & is.na(suppressWarnings(as.integer(n)))) which(colnames(DF)==n) else as.integer(n)
  m <- if (class(m)=="character" & is.na(suppressWarnings(as.integer(m)))) which(colnames(DF)==m) else as.integer(m)

  if (!(1<=n & n<=length(DF))) stop( "`n` represents invalid index!" )
  if (!(1<=m & m<=length(DF))) stop( "`m` represents invalid index!" )

  return (DF[ if (n==m) 1:length(DF) else c( (if (min(n,m)==1) c() else 1:(min(n,m)-1) ), (if (min(n,m)+1 == max(n,m)) (min(n,m)+1):(max(n,m)-1) else c( max(n,m), (min(n,m)+1):(max(n,m)-1), min(n,m))), (if (max(n,m)==length(DF)) c() else (max(n,m)+1):length(DF) ) ) ])



mapping <- data.frame( "piglet" = 1, "ssire" = 2, "dam" = 3, "tdate" = 4)


# A function that takes two data frames, one with actual data: DF, and the other with a 
# rearrangement of the columns: R
# R must be structured so that colnames(R) is a subset of colnames(DF)
# Alternatively, R can be structured so that 1 <= as.integer(colnames(R)) <= length(DF)
# Further, 1 <= R$column <= length(DF), and length(R$column) == 1
# These structural requirements on R are not checked
# This is for brevity and because most likely R has been created specifically for use with
# this function
rearrange <- function(DF, R)
  for (col in colnames(R))
    DF <- swap(DF, col, R[col])

  return (DF)

等等,就是这样吗?对。这会将每个列名称交换到适当的位置。这种swap的强大功能来自于使用异类参数的情况,这意味着我们可以指定要放在某处的 moving 列名,并且只要我们只尝试在其中放置一列即可每个位置(应该的位置),只要将该列放在该位置,它就不会再移动。这意味着即使以后的交换似乎可以撤消先前的放置,但异构参数也确保不会发生这种情况,因此,此外,映射中列的顺序也无关紧要。这是非常好的质量,因为这意味着我们不会在整个过程中过多地处理整个“组织数据”问题。您只需要确定要将每个列发送到的位置即可。



您可以使用我们构建的swap函数来成功地恰好交换两列,或者使用rearrange函数使用“重新排列”数据框来指定要移动的每个列名的发送位置。对于rearrange函数,如果为每个列名称选择的任何放置都未被指定的列之一占用(即不在colnames(R)中),则 多余交换可以并且很可能发生 (唯一不会发生的情况是,每个多余的交换都有一个伙伴多余的交换在结束之前将其撤销。如上所述,这极不可能偶然发生,但可以通过构造映射来在实践中实现此结果。

swap <- function(DF, n, m)
  n <- if (class(n)=="character" & is.na(suppressWarnings(as.integer(n)))) which(colnames(DF)==n) else as.integer(n)
  m <- if (class(m)=="character" & is.na(suppressWarnings(as.integer(m)))) which(colnames(DF)==m) else as.integer(m)

  if (!(1<=n & n<=length(DF))) stop( "`n` represents invalid index!" )
  if (!(1<=m & m<=length(DF))) stop( "`m` represents invalid index!" )

  return (DF[ if (n==m) 1:length(DF) else c( (if (min(n,m)==1) c() else 1:(min(n,m)-1) ), (if (min(n,m)+1 == max(n,m)) (min(n,m)+1):(max(n,m)-1) else c( max(n,m), (min(n,m)+1):(max(n,m)-1), min(n,m))), (if (max(n,m)==length(DF)) c() else (max(n,m)+1):length(DF) ) ) ])

rearrange <- function(DF, R)
  for (col in colnames(R))
    DF <- swap(DF, col, R[col])

  return (DF)

答案 6 :(得分:0)


swappy = function(v,a,b){  # where v is a dataframe, a and b are the 
columns indexes to swap

name = deparse(substitute(v))

helpy = v[,a]
v[,a] = v[,b]
v[,b] = helpy

name1 = colnames(v)[a] 
name2 = colnames(v)[b] 

colnames(v)[a] = name2
colnames(v)[b] = name1

assign(name,value = v , envir =.GlobalEnv)