我在数据框中有不整齐的数据,看起来像这样。
在这里你可以在'团队'中看到一些足球队的名字。 Name1-3是变量,列出了第一列中用于引用这些团队的不同名称。
team name1 name2 name3
1 Loughborough Loughborough
2 Luton Town Luton Town Luton
3 Macclesfield Macclesfield
4 Maidstone United Maidstone United
5 Manchester City Manchester City Man City
6 Manchester United Manchester United Newton Heath Man United
7 Mansfield Town Mansfield Town Mansfield
8 Merthyr Town Merthyr Town
我的目标是使用team-name1,team-name2,team-name3配对将数据分成2列。我只想保留那些配对,其中有name1,name2或name3中的数据。
要做到这一点,我正在尝试tidyr's- gather()
temp <- dat %>% gather(key, value, 2:4)
temp$key<-NULL
temp
这给出了以下输出:
team value
1 Loughborough Loughborough
2 Luton Town Luton Town
3 Macclesfield Macclesfield
4 Maidstone United Maidstone United
5 Manchester City Manchester City
6 Manchester United Manchester United
7 Mansfield Town Mansfield Town
8 Merthyr Town Merthyr Town
9 Loughborough
10 Luton Town Luton
11 Macclesfield
12 Maidstone United
13 Manchester City Man City
14 Manchester United Newton Heath
15 Mansfield Town Mansfield
16 Merthyr Town
17 Loughborough
18 Luton Town
19 Macclesfield
20 Maidstone United
21 Manchester City
22 Manchester United Man United
23 Mansfield Town
24 Merthyr Town
我试图删除不完整的案例(例如第20,21,23,24行但不是22),使用:
temp[complete.cases(temp),]
这不起作用,因为看似空的值观察包含一个字符“” - 我想这是gather()
返回缺失数据的方式?我尝试将temp$value
转换为一个因素,但这也不起作用。
我很想听听如何摆脱不完整的案件。
示例数据......
dat<-structure(list(team = structure(1:8, .Label = c("Loughborough",
"Luton Town", "Macclesfield", "Maidstone United", "Manchester City",
"Manchester United", "Mansfield Town", "Merthyr Town"), class = "factor"),
name1 = structure(1:8, .Label = c("Loughborough", "Luton Town",
"Macclesfield", "Maidstone United", "Manchester City", "Manchester United",
"Mansfield Town", "Merthyr Town"), class = "factor"), name2 = structure(c(1L,
2L, 1L, 1L, 3L, 5L, 4L, 1L), .Label = c("", "Luton", "Man City",
"Mansfield", "Newton Heath"), class = "factor"), name3 = structure(c(1L,
1L, 1L, 1L, 1L, 2L, 1L, 1L), .Label = c("", "Man United"), class = "factor")), .Names = c("team",
"name1", "name2", "name3"), row.names = c(NA, -8L), class = "data.frame")
答案 0 :(得分:5)
您还可以添加filter
(为了删除空白)和select
(以便删除key
列)从dplyr
包中添加所有内容
temp <- dat %>%
gather(key, value, 2:4) %>%
filter(value != "") %>%
select(-key)
# team value
# 1 Loughborough Loughborough
# 2 Luton Town Luton Town
# 3 Macclesfield Macclesfield
# 4 Maidstone United Maidstone United
# 5 Manchester City Manchester City
# 6 Manchester United Manchester United
# 7 Mansfield Town Mansfield Town
# 8 Merthyr Town Merthyr Town
# 9 Luton Town Luton
# 10 Manchester City Man City
# 11 Manchester United Newton Heath
# 12 Mansfield Town Mansfield
# 13 Manchester United Man United
答案 1 :(得分:1)
您在寻找:temp[temp$value!='',]
吗? gather
不应该归咎于空字符串,你的初始数据也是如此。您可以先替换它们,然后使用na.rm
中的gather
参数:
dat[dat==''] <- NA
temp <- dat %>% gather(key, value, 2:4, na.rm=TRUE)
temp$key<-NULL
tempA
答案 2 :(得分:1)
类似的方法,但使用na.omit:
dat %>%
gather(key, value, -team) %>%
select(-key) %>%
mutate(value = ifelse(value == "", NA, value)) %>%
na.omit %>%
arrange(team)