我尝试搜索并找到了将空行值替换为其他列而不是条件的答案。让我解释。
我有一个如下所示的数据框:
Name Grade Test1 Test2 Test3
John A none none
Jane B ok none
David none C barely
Sam B none
Thomas D fail
我想用其他列中的字母等级(删除以下注释)替换成绩列中缺少的成绩。 Test1 / Test2 / Test3列中永远不会有多个字母等级。所以我最喜欢的结果就是:
Name Grade Test1 Test2 Test3
John A none none
Jane B B ok none
David C none C barely
Sam B none
Thomas D D fail
任何帮助将不胜感激!
答案 0 :(得分:1)
我无耻地给@ akrun的数据留下了痕迹,以显示另一种符合split-apply-combine范例的方法
# define data
df1 <- structure(list(Name = c("John", "Jane", "David", "Sam", "Thomas"
), Grade = c("A", "", "", "B", ""), Test1 = c("none", "B ok",
"none", "none", ""), Test2 = c("none", "none", "C barely", "",
""), Test3 = c("", "", "", "", "D fail")), .Names = c("Name",
"Grade", "Test1", "Test2", "Test3"), class = "data.frame",
row.names = c(NA, -5L))
# load up libraries
library(dplyr)
library(tidyr)
# add a primary key
df1 <- df1 %>%
mutate(PK = 1:nrow(df1))
# turn the test results into tidy format, first by making long and skinny
# and then by bringing it back to one entry per person who has a test result
test_result <- df1 %>%
select(PK, Test1:Test3) %>%
gather(Variable, Value, -PK) %>%
mutate(Value = ifelse(Value == "none", "", substring(Value, 1, 1))) %>%
# drop all the unnecessary rows:
filter(Value != "")
# join back to the main data, fill in the test score when needed
df1 %>%
select(PK, Name, Grade) %>%
left_join(test_result, by = "PK") %>%
mutate(
Source = ifelse(Grade %in% LETTERS, "Grade", as.character(Variable)),
Grade = ifelse(Grade %in% LETTERS, Grade, Value)) %>%
select(-Value, - PK, -Variable)
这为您提供了一个非常整洁的数据集,应该更好地用于将来的分析和重复使用:
Name Grade Source
1 John A Grade
2 Jane B Test1
3 David C Test2
4 Sam B Grade
5 Thomas D Test3
答案 1 :(得分:0)
假设列为character
类,我们得到的等级为&#39;等级&#39;空白的元素(&#39; i1&#39;)
i1 <- df1$Grade==''
我们循环“测试”。列,即使用vapply
的第3列到第5列,使用\\S
对具有非空格(\\s
),后跟空格(grep
)的列中的元素进行子集,使用sub
删除空格及其后面的字符,并将输出分配到&#39;等级&#39;中的空白元素。
df1$Grade[i1] <- vapply(df1[i1,3:5], function(x)
sub('\\s+.*$', '', grep('^\\S\\s', x, value=TRUE)), character(1))
df1
# Name Grade Test1 Test2 Test3
#1 John A none none
#2 Jane B B ok none
#3 David C none C barely
#4 Sam B none
#5 Thomas D D fail
df1 <- structure(list(Name = c("John", "Jane", "David", "Sam", "Thomas"
), Grade = c("A", "", "", "B", ""), Test1 = c("none", "B ok",
"none", "none", ""), Test2 = c("none", "none", "C barely", "",
""), Test3 = c("", "", "", "", "D fail")), .Names = c("Name",
"Grade", "Test1", "Test2", "Test3"), class = "data.frame",
row.names = c(NA, -5L))
答案 2 :(得分:0)
当我在你的data
上尝试它时,首先从数据框中取出,然后将每个字符串的等级部分子串,然后将所有列合并为一个并生成最终表:
data[data=="none"]=""
A=function(x) substring(x,1,1)
data1=data.frame(data[1],apply(data[2:5],2,a))
all.grades=paste(data1$grade,data1$test1,data1$test2,data1$test3,sep="")
data1$grade=all.grades
final.data=data.frame(data1[1:2],data[3:5])
final.data
name grade test1 test2 test3
john A
jane B B ok
david C C barely
sam B
thomas D D fail