Question

我有这种格式的数据（2列）：

数据..........指数
city1 ............ 1
PERSON1 ....... 2
telephone1 ... 3
城2 .............. 1个
请分享帮助.............. 1个
电话3 .... 3

我添加了第二列，知道每行代表哪种数据（1个城市，2个人，3个电话）。

我需要的是（1,2,3成为列名）：

1 ................ 2 .................. 3
city1 ....... PERSON1 ... telephone1
城2 ........ NULL .......... NULL
请分享帮助........ NULL ........ telephone3

我怎样才能在R？

中这样做

Answer 1

这是一个可能的解决方案

#sample data
dd<-data.frame(
    data= c("city1","person1","telephone1","city2","city3","telephone3"),
    index=c(1,2,3,1,1,3),
    stringsAsFactors=F
)

#assign new row when index stays the same or decreases
row<-cumsum(c(1,diff(dd$index))<1)+1

#create empty matrix to hold result
mm<-matrix(character(), nrow=max(row), ncol=max(dd$index))

#put values where they belong
mm[cbind(row, dd$index)]<-dd$data

然后返回

#      [,1]    [,2]      [,3]        
# [1,] "city1" "person1" "telephone1"
# [2,] "city2" NA        NA          
# [3,] "city3" NA        "telephone3"

Answer 2

使用data列中嵌入的数字作为行索引，使用MrFlick的数据：

dd<-data.frame(
    data= c("city1","person1","telephone1","city2","city3","telephone3"),
    index=c(1,2,3,1,1,3),
    stringsAsFactors=F
)

r <- as.numeric(gsub("\\D*", "", dd$data))
m <- matrix(,nrow=max(r), ncol=max(dd$index))
m[cbind(r, dd$index)] <- dd$data
m

     [,1]    [,2]      [,3]        
[1,] "city1" "person1" "telephone1"
[2,] "city2" NA        NA          
[3,] "city3" NA        "telephone3"

Answer 3

这是另一种可能性：创建“时间”变量并使用“reshape2”中的reshape或dcast。

使用@ MrFlick的样本数据，这是“时间”变量。但是，这假设您的数据已经按正确顺序排列：

dd$time <- cumsum(dd$index == 1) ## Similar to MrFlick's approach...

以下是reshape方法：

reshape(dd, direction = "wide", idvar="time", timevar="index")
#   time data.1  data.2     data.3
# 1    1  city1 person1 telephone1
# 4    2  city2    <NA>       <NA>
# 5    3  city3    <NA> telephone3

这是dcast方法：

library(reshape2)
dcast(dd, time ~ index, value.var="data")
#   time     1       2          3
# 1    1 city1 person1 telephone1
# 2    2 city2    <NA>       <NA>
# 3    3 city3    <NA> telephone3

（仅供参考，由于使用矩阵索引，MrFlick的答案很可能是最快的。reshape对于较小的数据集来说相当有效，但速度很慢。如果速度是{{1}的问题}，请查看dcast。）

将一列拆分为R中的多行

3 个答案: