我尝试从pdf文件中读取一个表并将其转换为数据帧,但它正在将单元格中的下一行视为一个单独的行,就像这样
正如您在上面的图像中看到的那样,它将每个新行都视为一行,我想合并每一行,直到第一列中存在一个值。这样我的数据框就像。此
我们有什么方法可以做到这一点。
以下是示例数据
structure(list(V1 = c(1L, NA, NA, 2L, NA, NA), V2 = c("Chawla Associates",
"Architects, Interior", "Designers", "J Square", "Designers &",
"Engineering"), V3 = c("B-102, Sanik Nagar,", "Uttam Nagar, New",
"Delhi-110059", "H-office: H No.1031,", "Sec-67, Mohali (PB)",
"431-432, Sec-8,"), V4 = c("253336493", "M-", "9.51242E+11",
"M-9872815438", "M-98722-22676", NA), V5 = c("-", NA, NA, "Telefax-",
"0172-", "2574602"), V6 = c("Abhi2874@yahoo.co.in", NA, NA, "vincaljaidka@hotmail.co",
"m", NA), V7 = c("CA/99/24551", NA, NA, "CA/96/20742", NA, NA
)), .Names = c("V1", "V2", "V3", "V4", "V5", "V6", "V7"), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
由于
DOMNICK
答案 0 :(得分:1)
在阅读文件时似乎最好解决这个问题,但如果没有别的办法,这里有一个hacky函数可以清理这个特定的数据帧。
combineRows <- function(df){
newDf <- NULL
row <- NULL
for (r in c(1:nrow(df))){
if(!is.na(df[r,"V1"])){
if (!is.null(row)){
if (is.null(newDf)){
newDf <- row
} else{
newDf <- rbind(newDf,row)
}
}
row <- df[r,]
} else {
rows <- rbind(row,df[r,])
row <- apply(rows,2,function(x)paste(ifelse(is.na(x),'',x),collapse=" "))
}
}
newDf <- rbind(newDf,row)
#cleanup
newDf <- apply(newDf, 2,trimws)
rownames(newDf) <- 1:nrow(newDf)
return(newDf)
}
newDf <- combineRows(df)
> newDf
V1 V2 V3
1 "1" "Chawla Associates Architects, Interior Designers" "B-102, Sanik Nagar, Uttam Nagar, New Delhi-110059"
2 "2" "J Square Designers & Engineering" "H-office: H No.1031, Sec-67, Mohali (PB) 431-432, Sec-8,"
V4 V5 V6 V7
1 "253336493 M- 9.51242E+11" "-" "Abhi2874@yahoo.co.in" "CA/99/24551"
2 "M-9872815438 M-98722-22676" "Telefax- 0172- 2574602" "vincaljaidka@hotmail.co m" "CA/96/20742"