我目前正在使用包含数千条记录的数据文件,我必须使用R。
格式化它们这就是我的数据框现在的样子
ROWID CAT SERIALNO SUB PRODUCTNAME HOMENUM Start.X Start.Y End.X End.Y
1 111111111 CATA 10 43 PRODUCT A1 NA NA NA NA NA
2 1 NA NA NA NA NA NA NA
3 2 3 NA NA NA NA NA NA NA
4 4 5 NA NA NA NA NA NA NA
5 555555555 CATB 13 76 PRODUCT A2 NA NA NA NA NA
6 6 NA NA NA NA NA NA NA
7 7 8 NA NA NA NA NA NA NA
8 9 10 NA NA NA NA NA NA NA
我想要的格式
ROWID CAT SERIALNO SUB PRODUCTNAME HOMENUM Start.X Start.Y End.X End.Y
1 111111111 CATA 10 43 PRODUCT A1 1 2 3 4 5
2 555555555 CATB 13 76 PRODUCT A2 6 7 8 9 10
从上面的第一个截图中可以看出,最后4列的值分别为2,3,4和6,7,8行
我尝试使用t()函数,但似乎没有产生我需要的东西,并且使用fix()函数手动排列数据是不可能的,因为我正在处理大型数据文件。
无论如何使用R?
来实现所需的格式编辑:dput()的结果
structure(list(V1 = structure(c(9L, 2L, 1L, 3L, 4L, 5L, 6L, 7L,
8L), .Label = c("1", "111111111", "2", "4", "555555555", "6",
"7", "9", "ROWID"), class = "factor"), V2 = structure(c(6L, 7L,
1L, 3L, 4L, 8L, 1L, 5L, 2L), .Label = c("", "10", "3", "5", "8",
"CAT", "CATA", "CATB"), class = "factor"), V3 = structure(c(4L,
2L, 1L, 1L, 1L, 3L, 1L, 1L, 1L), .Label = c("", "10", "13", "SERIALNO"
), class = "factor"), V4 = structure(c(4L, 2L, 1L, 1L, 1L, 3L,
1L, 1L, 1L), .Label = c("", "43", "76", "SUB"), class = "factor"),
V5 = structure(c(4L, 2L, 1L, 1L, 1L, 3L, 1L, 1L, 1L), .Label = c("",
"PRODUCT A1", "PRODUCT A2", "PRODUCTNAME"), class = "factor"),
V6 = structure(c(2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("",
"HOMENUM"), class = "factor"), V7 = structure(c(2L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L), .Label = c("", "Start X"), class = "factor"),
V8 = structure(c(2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("",
"Start Y"), class = "factor"), V9 = structure(c(2L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L), .Label = c("", "End X"), class = "factor"),
V10 = structure(c(2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("",
"End Y"), class = "factor")), .Names = c("V1", "V2", "V3",
"V4", "V5", "V6", "V7", "V8", "V9", "V10"), class = "data.frame", row.names = c(NA,
-9L))
答案 0 :(得分:1)
我怀疑你应该先处理数据导入。您的导入未使用header = TRUE
,因此我们必须先修复列名称:
names(DF) <- as.character(unlist(DF[1,]))
DF <- DF[-1,]
然后我们可以选择每第五行:
DF1 <- DF[seq_len(nrow(DF)) %% 4 == 1L,]
现在我们可以选择所有其他行的前两列并转置它们:
temp <- t(DF[seq_len(nrow(DF)) %% 4 != 1L, 1:2])
我们从生成的字符矩阵中删除空单元格,并将生成的字符向量转换为五列矩阵,然后将其分配给新data.frame的最后五列:
DF1[, 6:10] <- matrix(temp[temp != ""], ncol = 5, byrow = TRUE)
最后,我们转换列类型,因此该数字实际上是数字而不是字符:
DF1[] <- lapply(DF1, function(x) type.convert(as.character(x), as.is = TRUE))
print(DF1)
# ROWID CAT SERIALNO SUB PRODUCTNAME HOMENUM Start X Start Y End X End Y
#2 111111111 CATA 10 43 PRODUCT A1 1 2 3 4 5
#6 555555555 CATB 13 76 PRODUCT A2 6 7 8 9 10