从tidyr的输出中删除不完整的案例 - gather() - r

时间:2014-07-30 19:27:13

标签: r dplyr reshape2 tidyr

我在数据框中有不整齐的数据,看起来像这样。

在这里你可以在'团队'中看到一些足球队的名字。 Name1-3是变量,列出了第一列中用于引用这些团队的不同名称。

               team             name1        name2      name3
1      Loughborough      Loughborough                        
2        Luton Town        Luton Town        Luton           
3      Macclesfield      Macclesfield                        
4  Maidstone United  Maidstone United                        
5   Manchester City   Manchester City     Man City           
6 Manchester United Manchester United Newton Heath Man United
7    Mansfield Town    Mansfield Town    Mansfield           
8      Merthyr Town      Merthyr Town                        

我的目标是使用team-name1,team-name2,team-name3配对将数据分成2列。我只想保留那些配对,其中有name1,name2或name3中的数据。

要做到这一点,我正在尝试tidyr's- gather()

temp <- dat %>% gather(key, value, 2:4) 
temp$key<-NULL
temp

这给出了以下输出:

                team             value
1       Loughborough      Loughborough
2         Luton Town        Luton Town
3       Macclesfield      Macclesfield
4   Maidstone United  Maidstone United
5    Manchester City   Manchester City
6  Manchester United Manchester United
7     Mansfield Town    Mansfield Town
8       Merthyr Town      Merthyr Town
9       Loughborough                  
10        Luton Town             Luton
11      Macclesfield                  
12  Maidstone United                  
13   Manchester City          Man City
14 Manchester United      Newton Heath
15    Mansfield Town         Mansfield
16      Merthyr Town                  
17      Loughborough                  
18        Luton Town                  
19      Macclesfield                  
20  Maidstone United                  
21   Manchester City                  
22 Manchester United        Man United
23    Mansfield Town                  
24      Merthyr Town                  

我试图删除不完整的案例(例如第20,21,23,24行但不是22),使用:

temp[complete.cases(temp),]

这不起作用,因为看似空的值观察包含一个字符“” - 我想这是gather()返回缺失数据的方式?我尝试将temp$value转换为一个因素,但这也不起作用。

我很想听听如何摆脱不完整的案件。

示例数据......

dat<-structure(list(team = structure(1:8, .Label = c("Loughborough", 
"Luton Town", "Macclesfield", "Maidstone United", "Manchester City", 
"Manchester United", "Mansfield Town", "Merthyr Town"), class = "factor"), 
    name1 = structure(1:8, .Label = c("Loughborough", "Luton Town", 
    "Macclesfield", "Maidstone United", "Manchester City", "Manchester United", 
    "Mansfield Town", "Merthyr Town"), class = "factor"), name2 = structure(c(1L, 
    2L, 1L, 1L, 3L, 5L, 4L, 1L), .Label = c("", "Luton", "Man City", 
    "Mansfield", "Newton Heath"), class = "factor"), name3 = structure(c(1L, 
    1L, 1L, 1L, 1L, 2L, 1L, 1L), .Label = c("", "Man United"), class = "factor")), .Names = c("team", 
"name1", "name2", "name3"), row.names = c(NA, -8L), class = "data.frame")

3 个答案:

答案 0 :(得分:5)

您还可以添加filter(为了删除空白)和select(以便删除key列)从dplyr包中添加所有内容

temp <- dat %>% 
  gather(key, value, 2:4) %>% 
  filter(value != "") %>%
  select(-key)

#                 team             value
# 1       Loughborough      Loughborough
# 2         Luton Town        Luton Town
# 3       Macclesfield      Macclesfield
# 4   Maidstone United  Maidstone United
# 5    Manchester City   Manchester City
# 6  Manchester United Manchester United
# 7     Mansfield Town    Mansfield Town
# 8       Merthyr Town      Merthyr Town
# 9         Luton Town             Luton
# 10   Manchester City          Man City
# 11 Manchester United      Newton Heath
# 12    Mansfield Town         Mansfield
# 13 Manchester United        Man United

答案 1 :(得分:1)

您在寻找:temp[temp$value!='',]吗? gather不应该归咎于空字符串,你的初始数据也是如此。您可以先替换它们,然后使用na.rm中的gather参数:

dat[dat==''] <- NA
temp <- dat %>% gather(key, value, 2:4, na.rm=TRUE) 
temp$key<-NULL
tempA

答案 2 :(得分:1)

类似的方法,但使用na.omit:

dat %>% 
  gather(key, value, -team) %>% 
  select(-key) %>%
  mutate(value = ifelse(value == "", NA, value)) %>%
  na.omit %>%
  arrange(team)