我有一个如下数据集:
+------------+
| Expression |
+------------+
| CCR(A-B) |
+------------+
| 1-2(A) |
+------------+
| 3-4(A-B) |
+------------+
| 5(A) |
+------------+
代码数据框中描述了列 Dim filename As String = "C:\apps\test.exe"
Dim filepath As String = Path.GetDirectoryName(filename)
Dim proc = New Process() With {
.StartInfo = New ProcessStartInfo() With {
.FileName = filename,
.WorkingDirectory = filepath,
.UseShellExecute = False,
.RedirectStandardOutput = True,
.RedirectStandardError = True,
.CreateNoWindow = True
}
}
,dat1 <- read.table(header=TRUE, text="
ID Age Align Weat
8645 15-24 A 1
6228 15-24 B 1
5830 15-24 A 3
1844 25-34 B 1
4461 35-44 B 2
2119 35-44 C 2
2115 45-54 A 1
")
dat1
ID Age Align Weat
1 8645 15-24 A 1
2 6228 15-24 B 1
3 5830 15-24 A 3
4 1844 25-34 B 1
5 4461 35-44 B 2
6 2119 35-44 C 2
7 2115 45-54 A 1
和Age
的属性:
Align
我希望匹配代码数据框以获取我的数据集,如下所示:
Weat
我目前正在使用以下代码执行我的任务,这对于具有500列的大型数据集和这些列的代码表效率不高。
dat2 <- read.table(header=TRUE, text="
Code Desc Column
15-24 Young Age
25-34 Young Age
35-44 Middle Age
45-54 Middle Age
A Straight Align
B Curve Align
C Hill Align
1 Clear Weat
2 Cloudy Weat
3 Rain Weat
")
dat2
Code Desc Column
1 15-24 Young Age
2 25-34 Young Age
3 35-44 Middle Age
4 45-54 Middle Age
5 A Straight Align
6 B Curve Align
7 C Hill Align
8 1 Clear Weat
9 2 Cloudy Weat
10 3 Rain Weat
答案 0 :(得分:1)
尝试一个简单的for循环:
varnames <- unique(dat2$Column)
dat3 <- dat1
for (i in varnames)
{ startvars <- names(dat3)[!names(dat3) %in% i]
dat3 <- merge(dat3, subset(dat2, Column==i),
by.x=i, by.y="Code")[,c(startvars, "Desc")]
colnames(dat3)[names(dat3) %in% "Desc"] <- i
}
结果:
ID Age Align Weat
1 8645 Young Straight Clear
2 2115 Middle Straight Clear
3 6228 Young Curve Clear
4 1844 Young Curve Clear
5 4461 Middle Curve Cloudy
6 2119 Middle Hill Cloudy
7 5830 Young Straight Rain
这显然不是超级高效的,带有一些dcast的data.table解决方案可能是有序的,但我会留下让别人去思考。
PS:通过将stringsAsFactors= F, colClasses= rep("character",4))
添加到read.table
答案 1 :(得分:1)
您可以在for
:
dat1
循环
# 'intersect' is needed to recode only those columns which have description
for (each_column in intersect(colnames(dat1), dat2$Column)){
curr_dict = dat2$Column %in% each_column
code = dat2$Code[curr_dict]
descr = dat2$Desc[curr_dict]
dat1[[each_column]] = descr[match(dat1[[each_column]], code)]
}