合并data.frame

时间:2017-07-25 09:20:09

标签: r

Merge rows in one data.frameMerge two rows in one dataframe, when the rows are disjoint and contain nulls类似,我面临以下问题,上述帖子可能对此有所帮助。

我的数据看起来像这样

| Date     | Checkin | Origin | Checkout | Destination |
| 03-07-17 | 08:00   | A      |          |             |
| 03-07-17 |         | A      | 09:00    | B           |
| 03-07-17 | 17:00   | B      |          |             |
| 03-07-17 |         | B      | 18:00    | A           |
| 04-07-17 | 08:00   | A      |          |             |
| 04-07-17 |         | A      | 09:00    | B           |
| 04-07-17 | 17:00   | B      |          |             |
| 04-07-17 |         | B      | 18:00    | A           |

现在我想把它聚合成4行,如下所示:

| Date     | Checkin | Origin | Checkout | Destination |
| 03-07-17 | 08:00   | A      | 09:00    | B           |
| 03-07-17 | 17:00   | B      | 18:00    | A           |
| 04-07-17 | 08:00   | A      | 09:00    | B           |
| 04-07-17 | 17:00   | B      | 18:00    | A           |

有什么想法吗? 谢谢!

3 个答案:

答案 0 :(得分:2)

通过dplyr

的想法
library(dplyr)

df %>% 
 group_by(Date, Origin) %>% 
 summarise_all(funs(trimws(paste(., collapse = ''))))
 A tibble: 4 x 5
 Groups:   Date [?]
        Date   Origin Checkin Checkout Destination
       <chr>    <chr>   <chr>    <chr>       <chr>
1  03-07-17   A         08:00    09:00           B
2  03-07-17   B         17:00    18:00           A
3  04-07-17   A         08:00    09:00           B
4  04-07-17   B         17:00    18:00           A

数据

dput(df)
structure(list(Date = c(" 03-07-17 ", " 03-07-17 ", " 03-07-17 ", 
" 03-07-17 ", " 04-07-17 ", " 04-07-17 ", " 04-07-17 ", " 04-07-17 "
), Checkin = c(" 08:00   ", "         ", " 17:00   ", "         ", 
" 08:00   ", "         ", " 17:00   ", "         "), Origin = c(" A      ", 
" A      ", " B      ", " B      ", " A      ", " A      ", " B      ", 
" B      "), Checkout = c("          ", " 09:00    ", "          ", 
" 18:00    ", "          ", " 09:00    ", "          ", " 18:00    "
), Destination = c("             ", " B           ", "             ", 
" A           ", "             ", " B           ", "             ", 
" A           ")), .Names = c("Date", "Checkin", "Origin", "Checkout", 
"Destination"), row.names = c(NA, -8L), class = "data.frame")

答案 1 :(得分:1)

如果您的数据与上述结构完全相同,并且您对此有很高的确定性,则可以在基数R中使用以下内容。

cbind(dat[c(TRUE,FALSE), 1:3], dat[c(FALSE, TRUE), 4:5])
        Date   Checkin   Origin   Checkout   Destination
1  03-07-17   08:00     A        09:00      B           
3  03-07-17   17:00     B        18:00      A           
5  04-07-17   08:00     A        09:00      B           
7  04-07-17   17:00     B        18:00      A 

我们的想法是对第1列到第3列采用奇数行(1,3,5),并在第4列和第5列附加偶数行(2,4,6)。

如果任何行无序或没有配对,则无效。

答案 2 :(得分:0)

虽然并不需要使用dplyr,但更多的是围绕着这种方式。我不确定你的任何课程是如何基于你的例子我将表格粘贴到excel并将其保存为.csv并且只是按照它给了我的内容。无论如何,如果你确定&#34;空&#34;索引实际上是空的,那么你可以使用完整的案例。

SELECT Name, Age, Occupation
FROM YourTable
WHERE Age = 20
AND Name NOT IN (SELECT Name FROM YourTable WHERE Age <> 20)

这产生了:

setwd(Your Working directory)
data = read.csv("exampledata.csv")

data$Date<-as.Date(data$Date,format='%m/%d/%Y')
data$Checkin<-as.character(data$Checkin)
data$Checkin[data$Checkin==""]<-NA

data$Checkout<-as.character(data$Checkout)
data$Checkout[data$Checkout==""]<-NA

checkIns<-data[complete.cases(data$Checkin),]
checkIns$Destination[checkIns$Destination==""]<-NA

checkOuts<-data[complete.cases(data$Checkout),]

data2<-merge(checkIns,checkOuts,by=c("Date","Origin"))
data2 <- data2[,colSums(is.na(data2))<nrow(data2)]
head<-colnames(data)
colnames(data2)<-head

data2