将长格式的分组数据转换为宽格式

时间:2019-09-13 17:35:33

标签: r tidyr

我目前有长格式数据

# Input
library(dplyr)
library(tidyr)

tibble(
    x = c(1,1,2,2), 
    y = c("A", "B", "C", "D")
) 

我想扩大数据看起来像这样:

# Desired Output
tibble(
    x = c(1,2), 
    x_1 = c("A", "C"), 
    x_2 = c("B", "D")
) 

但这不是典型的tidyr::spread(),因为我的列名没有成为单元格值。因此,尽管这看起来很简单,但我很困惑。

3 个答案:

答案 0 :(得分:3)

library(data.table)

dcast(df, x ~ paste0('x_', rowid(x)))
#   x x_1 x_2
# 1 1   A   B
# 2 2   C   D

答案 1 :(得分:2)

一种选择是按组创建序列列,然后执行@mfunction("J, grad") def costFunction(theta=None, X=None, y=None): #COSTFUNCTION Compute cost and gradient for logistic regression # J = COSTFUNCTION(theta, X, y) computes the cost of using theta as the # parameter for logistic regression and the gradient of the cost # w.r.t. to the parameters. # Initialize some useful values m = length(y) # number of training examples # You need to return the following variables correctly J = 0 grad = zeros(size(theta)) # ====================== YOUR CODE HERE ====================== # Instructions: Compute the cost of a particular choice of theta. # You should set J to the cost. # Compute the partial derivatives and set grad to the partial # derivatives of the cost w.r.t. each parameter in theta # # Note: grad should have the same dimensions as theta # Use sigmoid function previously programed hypothesis = sigmoid(X * theta)# Hypothesis for logistic regression Weight = 1 / m J = -Weight * sum((y *elmul* log(hypothesis) + (1 - y) *elmul* log(1 - hypothesis))) for i in mslice[1:m]: grad = grad + (hypothesis(i) - y(i)) * X(i, mslice[:]).cT # X must be transposed end grad = Weight * grad # ============================================================= end

spread

或者在library(dplyr) library(tidyr) library(stringr) tbl1 %>% group_by(x) %>% group_by(x1 = str_c('x_', row_number())) %>% # or using paste0 from base R (as @d.b commented) # group_by(x1 = paste0('x_', row_number())) %>% spread(x1, y) # A tibble: 2 x 3 # x x_1 x_2 # <dbl> <chr> <chr> #1 1 A B #2 2 C D

base R

答案 2 :(得分:1)

tidyr 1.0.0

您可以执行以下操作:

library(tidyr)

df <- tibble(
  x = c(1,1,2,2), 
  y = c("A", "B", "C", "D")
) 

chop(df,y) %>% unnest_wider(y)
#> New names:
#> * `` -> ...1
#> * `` -> ...2
#> New names:
#> * `` -> ...1
#> * `` -> ...2
#> # A tibble: 2 x 3
#>       x ...1  ...2 
#>   <dbl> <chr> <chr>
#> 1     1 A     B    
#> 2     2 C     D

reprex package(v0.3.0)于2019-09-14创建

names_repair = ~sub("..." , "x_", ., fixed=TRUE)调用中添加参数unnest_wider,以获取您在问题中输入的名称。