情况是飙车...有时车手与竞争对手比赛,有时他们只是一个人比赛。驾驶员及其技能水平始终是完全随机的。比赛在第12圈结束后,每天进行一次比赛,持续10年。有数百个驱动程序。独立的观察员在比赛期间记录了数据,包括驾驶员的速度,但仅限于其中一名驾驶员!因此,数据丢失。这是数据的前6行:
df <- data.frame(
Driver_name = c("Rick", "Julie", "Denver", "Johny", "Cassandra", "Phillip"),
Driver_level = c("A", "C", "D", "A", "B", "B"),
Driver_speed = c(96, 91, 89, 94, 88, 99),
Competitor= c("Yes", "Yes", "Yes", "Yes", "No", "No"),
Comp_name= c("Julie", "Rick", "Johnny", "Denver", "NA", "NA"),
Comp_level= c("B", "B", "D", "A", "NA", "NA"),
Comp_speed= c("???", "???", "???", "???", "NA", "NA"),
Race_day= c(165, 165, 72, 72, 92, 65),
Lap_number= c(9, 9, 12, 12, 8, 4),
Humidity= c(33, 33, 88, 88, 12, 55),
Temperature= c(28, 28, 12, 12, 20, 28)
)
每行用于不同的驱动程序,但我需要填写数据以了解竞争对手的速度!我将手动输入速度,以说明其余数据集需要执行的操作。
df_1 <- data.frame(
Driver_name = c("Rick", "Julie", "Denver", "Johny", "Cassandra", "Phillip"),
Driver_level = c("A", "C", "D", "A", "B", "B"),
Driver_speed = c(96, 91, 89, 94, 88, 99),
Competitor= c("Yes", "Yes", "Yes", "Yes", "No", "No"),
Comp_name= c("Julie", "Rick", "Johnny", "Denver", "NA", "NA"),
Comp_level= c("B", "B", "D", "A", "NA", "NA"),
Comp_speed= c(91, 96, 94, 89, "NA", "NA"),
Race_day= c(165, 165, 72, 72, 92, 65),
Lap_number= c(9, 9, 12, 12, 8, 4),
Humidity= c(33, 33, 88, 88, 12, 55),
Temperature= c(28, 28, 12, 12, 20, 28)
)
答案 0 :(得分:0)
left_join
是理想的选择。
您的数据
df <- data.frame(
Driver_name = c("Rick", "Julie", "Denver", "Johny", "Cassandra", "Phillip"),
Driver_level = c("A", "C", "D", "A", "B", "B"),
Driver_speed = c(96, 91, 89, 94, 88, 99),
Competitor= c("Yes", "Yes", "Yes", "Yes", "No", "No"),
Comp_name= c("Julie", "Rick", "Johnny", "Denver", "NA", "NA"),
Comp_level= c("B", "B", "D", "A", "NA", "NA"),
Comp_speed= c("???", "???", "???", "???", "NA", "NA"),
Race_day= c(165, 165, 72, 72, 92, 65),
Lap_number= c(9, 9, 12, 12, 8, 4),
Humidity= c(33, 33, 88, 88, 12, 55),
Temperature= c(28, 28, 12, 12, 20, 28)
)
我们加载了dplyr
包
#install.packages("dplyr") #if you don't have it
library(dplyr)
让我们摆脱当前具有“ ???”的Comp_speed
列值。
df <- df %>% select(-Comp_speed)
让我们创建一个仅包含名称和速度的第二个数据帧,然后即时将Driver_speed重命名为Comp_speed。
df2 <- df %>%
select(Driver_name, Comp_speed = Driver_speed)
现在我们可以将left_join
数据帧df
到df2
。 Comp_name
中的df
与Driver_name
中的df2
匹配
df_updated <- df %>%
left_join(df2, by = c("Comp_name" = "Driver_name"))
#> Warning: Column `Comp_name`/`Driver_name` joining factors with different
#> levels, coercing to character vector
这是结果数据框df_updated
df_updated
#> Driver_name Driver_level Driver_speed Competitor Comp_name Comp_level
#> 1 Rick A 96 Yes Julie B
#> 2 Julie C 91 Yes Rick B
#> 3 Denver D 89 Yes Johnny D
#> 4 Johny A 94 Yes Denver A
#> 5 Cassandra B 88 No NA NA
#> 6 Phillip B 99 No NA NA
#> Race_day Lap_number Humidity Temperature Comp_speed
#> 1 165 9 33 28 91
#> 2 165 9 33 28 96
#> 3 72 12 88 12 NA
#> 4 72 12 88 12 89
#> 5 92 8 12 20 NA
#> 6 65 4 55 28 NA
随着OP的提出,这对于不止一次赛车的赛车手来说并不牢固(我的疏忽)。
(从数据中)假设Race_day
和Lap_number
变量足以区分每个头对头种族,我们只需将它们保留在df2
数据框中。然后在我们的left_join
中加入这些列名称。这就是它的样子。
df2 <- df %>%
select(Driver_name, Comp_speed = Driver_speed, Race_day, Lap_number)
df_updated <- df %>%
left_join(df2, by = c("Comp_name" = "Driver_name", "Race_day", "Lap_number"))
#> Warning: Column `Comp_name`/`Driver_name` joining factors with different
#> levels, coercing to character vector
答案 1 :(得分:0)
我们需要将df留给自己。
!names(df)%in%c(“ Comp_speed”)从第一个数据帧x中删除变量Comp_speed。
df [,c(“ Driver_name”,“ Driver_speed”)]]仅在第二个数据帧y中包含变量Driver_name和Driver_speed。
总而言之,x中的Comp_name与y中的Driver_name匹配,而y中的Driver_speed被报告为Driver_speed.y(Driver_speed.y,因为df中已经存在Driver_speed,在连接后将其更改为Driver_speed.x ):
df <- merge(x=df[,!names(df)%in%c("Comp_speed")],y=df[,c("Driver_name","Driver_speed")],by.x="Comp_name",by.y="Driver_name",all.x=TRUE)
现在,我们只需要将“ Driver_speed.x”,“ Driver_speed.y”的名称更改为“ Driver_speed”,“ Comp_speed”:
library("data.table")
setnames(df,c("Driver_speed.x","Driver_speed.y"),c("Driver_speed","Comp_speed"))
答案 2 :(得分:0)
我想<form>
<input name="a.one">
<input name="a.two">
<input name="a.three">
<input name="b.one">
<input name="b.two">
<input name="b.three">
<input name="c.one">
<input name="c.two">
<input name="c.three">
</form>
可以满足您的需求