与Merge rows in one data.frame和Merge two rows in one dataframe, when the rows are disjoint and contain nulls类似,我面临以下问题,上述帖子可能对此有所帮助。
我的数据看起来像这样
| Date | Checkin | Origin | Checkout | Destination |
| 03-07-17 | 08:00 | A | | |
| 03-07-17 | | A | 09:00 | B |
| 03-07-17 | 17:00 | B | | |
| 03-07-17 | | B | 18:00 | A |
| 04-07-17 | 08:00 | A | | |
| 04-07-17 | | A | 09:00 | B |
| 04-07-17 | 17:00 | B | | |
| 04-07-17 | | B | 18:00 | A |
现在我想把它聚合成4行,如下所示:
| Date | Checkin | Origin | Checkout | Destination |
| 03-07-17 | 08:00 | A | 09:00 | B |
| 03-07-17 | 17:00 | B | 18:00 | A |
| 04-07-17 | 08:00 | A | 09:00 | B |
| 04-07-17 | 17:00 | B | 18:00 | A |
有什么想法吗? 谢谢!
答案 0 :(得分:2)
通过dplyr
,
library(dplyr)
df %>%
group_by(Date, Origin) %>%
summarise_all(funs(trimws(paste(., collapse = ''))))
A tibble: 4 x 5 Groups: Date [?] Date Origin Checkin Checkout Destination <chr> <chr> <chr> <chr> <chr> 1 03-07-17 A 08:00 09:00 B 2 03-07-17 B 17:00 18:00 A 3 04-07-17 A 08:00 09:00 B 4 04-07-17 B 17:00 18:00 A
数据强>
dput(df)
structure(list(Date = c(" 03-07-17 ", " 03-07-17 ", " 03-07-17 ",
" 03-07-17 ", " 04-07-17 ", " 04-07-17 ", " 04-07-17 ", " 04-07-17 "
), Checkin = c(" 08:00 ", " ", " 17:00 ", " ",
" 08:00 ", " ", " 17:00 ", " "), Origin = c(" A ",
" A ", " B ", " B ", " A ", " A ", " B ",
" B "), Checkout = c(" ", " 09:00 ", " ",
" 18:00 ", " ", " 09:00 ", " ", " 18:00 "
), Destination = c(" ", " B ", " ",
" A ", " ", " B ", " ",
" A ")), .Names = c("Date", "Checkin", "Origin", "Checkout",
"Destination"), row.names = c(NA, -8L), class = "data.frame")
答案 1 :(得分:1)
如果您的数据与上述结构完全相同,并且您对此有很高的确定性,则可以在基数R中使用以下内容。
cbind(dat[c(TRUE,FALSE), 1:3], dat[c(FALSE, TRUE), 4:5])
Date Checkin Origin Checkout Destination
1 03-07-17 08:00 A 09:00 B
3 03-07-17 17:00 B 18:00 A
5 04-07-17 08:00 A 09:00 B
7 04-07-17 17:00 B 18:00 A
我们的想法是对第1列到第3列采用奇数行(1,3,5),并在第4列和第5列附加偶数行(2,4,6)。
如果任何行无序或没有配对,则无效。
答案 2 :(得分:0)
虽然并不需要使用dplyr,但更多的是围绕着这种方式。我不确定你的任何课程是如何基于你的例子我将表格粘贴到excel并将其保存为.csv并且只是按照它给了我的内容。无论如何,如果你确定&#34;空&#34;索引实际上是空的,那么你可以使用完整的案例。
SELECT Name, Age, Occupation
FROM YourTable
WHERE Age = 20
AND Name NOT IN (SELECT Name FROM YourTable WHERE Age <> 20)
这产生了:
setwd(Your Working directory)
data = read.csv("exampledata.csv")
data$Date<-as.Date(data$Date,format='%m/%d/%Y')
data$Checkin<-as.character(data$Checkin)
data$Checkin[data$Checkin==""]<-NA
data$Checkout<-as.character(data$Checkout)
data$Checkout[data$Checkout==""]<-NA
checkIns<-data[complete.cases(data$Checkin),]
checkIns$Destination[checkIns$Destination==""]<-NA
checkOuts<-data[complete.cases(data$Checkout),]
data2<-merge(checkIns,checkOuts,by=c("Date","Origin"))
data2 <- data2[,colSums(is.na(data2))<nrow(data2)]
head<-colnames(data)
colnames(data2)<-head
data2