我有一个包含一些缺失数据的向量,我想将其转换为包含4列的数据帧。
我有两个问题: 1.如何将一列拆分为多列 2.如何解释缺失的数据
数据:
# Create similar data
a <- c("building #1 Addr 01 Zip 99999","20 sq ft","23","-33 rev",
"building #2 Addr 02 Zip 99999","30 sq ft","23",
"building #3 Addr 03 Zip 99999","40 sq ft",
"building #4 Addr 04 Zip 99999","50 sq ft","23","-33 rev",
"building #5 Addr 05 Zip 99999","-33 rev",
"building #6 Addr 06 Zip 99999","70 sq ft","23","-33 rev",
"building #7 Addr 07 Zip 99999","80 sq ft",
"building #8 Addr 08 Zip 99999","90 sq ft","23","-33 rev",
"building #9 Addr 09 Zip 99999","00 sq ft")
我想创建一个如下所示的表:
# Desired output
building_id <- c("building #1 Addr 01 Zip 99999",
"building #2 Addr 02 Zip 99999",
"building #3 Addr 03 Zip 99999",
"building #4 Addr 04 Zip 99999",
"building #5 Addr 05 Zip 99999",
"building #6 Addr 06 Zip 99999",
"building #7 Addr 07 Zip 99999",
"building #8 Addr 08 Zip 99999",
"building #9 Addr 09 Zip 99999")
sqft<- c("20 sq ft","30 sq ft","40 sq ft","50 sq ft","","70 sq ft",
"80 sq ft","90 sq ft","00 sq ft")
employees <- c("23","23","","23","","23","","23","")
revenue <- c("-33 rev","","","-33 rev","","-33 rev","","-33 rev","")
df <- data.frame(building_id,sqft,employees,revenue)
building_id sqft employees revenue
building #1 Addr 01 Zip 99999 20 sq ft 23 -33 rev
building #2 Addr 02 Zip 99999 30 sq ft 23
building #3 Addr 03 Zip 99999 40 sq ft
building #4 Addr 04 Zip 99999 50 sq ft 23 -33 rev
building #5 Addr 05 Zip 99999
building #6 Addr 06 Zip 99999 70 sq ft 23 -33 rev
building #7 Addr 07 Zip 99999 80 sq ft
building #8 Addr 08 Zip 99999 90 sq ft 23 -33 rev
building #9 Addr 09 Zip 99999 00 sq ft
答案 0 :(得分:2)
我们可以split
向量(&#34; a&#34;)进入列表(&#34; lst&#34;),基于创建分组变量来查找&#34;建筑&#34;在向量中
(grepl('^building',..)
。grep
个别元素循环中的列表元素(sapply(
)(&#39;建筑&#39;,&#39; sq ft&#39;等)如果结果为&#39; 0&#39;(不存在),则分配NA或grep
值,unlist
和rbind
以创建数据集d1
。
lst <- split(a, cumsum(grepl('^building', a)))
d1 <- do.call(rbind.data.frame,lapply(lst, function(x)
unlist(sapply(c('building', 'sq ft', '^\\d+$', 'rev'), function(y) {
x1 <- grep(y, x, value=TRUE)
if(!length(x1)) NA else x1}))))
colnames(d1) <- c("building_id","sqft","employees","revenue")
d1
# building_id sqft employees revenue
#1 building #1 Addr 01 Zip 99999 20 sq ft 23 -33 rev
#2 building #2 Addr 02 Zip 99999 30 sq ft 23 <NA>
#3 building #3 Addr 03 Zip 99999 40 sq ft <NA> <NA>
#4 building #4 Addr 04 Zip 99999 50 sq ft 23 -33 rev
#5 building #5 Addr 05 Zip 99999 <NA> <NA> -33 rev
#6 building #6 Addr 06 Zip 99999 70 sq ft 23 -33 rev
#7 building #7 Addr 07 Zip 99999 80 sq ft <NA> <NA>
#8 building #8 Addr 08 Zip 99999 90 sq ft 23 -33 rev
#9 building #9 Addr 09 Zip 99999 00 sq ft <NA> <NA>