我有一个包含两列信息的数据框,我想创建一个基于第二列的新列,并选择不包含NA值的内容,如果它是重复的,将选择第一列。
df:
200610-1 rs28619217
200610-10 NA
200610-100 rs367572771
200610-102 rs144402189
200610-105 rs375896687
200610-107 NA
200610-108 NA
200610-109 NA
200610-110 rs199838004
200610-111 rs374875201
200610-112 NA
200610-113 rs377546596
200610-114 NA
200610-115 NA
200610-116 NA
200610-117 rs67858721
200610-118 rs67858721
200610-119 rs9876735
200610-120 rs9876735
desired output:
200610-1 rs28619217 rs28619217
200610-10 NA 200610-10
200610-100 rs367572771 rs367572771
200610-102 rs144402189 rs144402189
200610-105 rs375896687 rs375896687
200610-107 NA 200610-107
200610-108 NA 200610-108
200610-109 NA 200610-109
200610-110 rs199838004 rs199838004
200610-111 rs374875201 rs374875201
200610-112 NA 200610-112
200610-113 rs377546596 rs377546596
200610-114 NA 200610-114
200610-115 NA 200610-115
200610-116 NA 200610-116
200610-117 rs67858721 rs67858721
200610-118 rs67858721 200610-118
200610-119 rs9876735 rs9876735
200610-120 rs9876735 200610-120
我应该一步一步做什么?我正在考虑使用apply函数。
答案 0 :(得分:1)
考虑下面的变体......
pinMode(motor_l_u, OUTPUT);
pinMode(motor_l_v, OUTPUT);
pinMode(motor_r_u, OUTPUT);
pinMode(motor_r_v, OUTPUT);
digitalWrite(motor_l_u, LOW); // at start turn off the GPIO
digitalWrite(motor_l_v, LOW); // at start turn off the GPIO
digitalWrite(motor_r_u, LOW); // at start turn off the GPIO
digitalWrite(motor_r_v, LOW); // at start turn off the GPIO
pinMode(motor_l_u, PWM_OUTPUT);
pinMode(motor_l_v, PWM_OUTPUT);
pinMode(motor_r_u, PWM_OUTPUT);
pinMode(motor_r_v, PWM_OUTPUT);
答案 1 :(得分:1)
我们可以使用ifelse
df1$Col3 <- with(df1, ifelse(is.na(Col2), Col1, Col2))
df1$Col3
#[1] "rs28619217" "200610-10" "rs367572771" "rs144402189" "rs375896687"
#[6] "200610-107" "200610-108" "200610-109" "rs199838004" "rs374875201"
#[11] "200610-112" "rs377546596" "200610-114" "200610-115" "200610-116"
如果有重复项,如评论中提到的@Sotos,我们可以在duplicated
ifelse
的逻辑向量
with(df1, ifelse(is.na(Col2)|duplicated(Col2), Col1, Col2))
答案 2 :(得分:1)
mutate和ifelse语句将完成工作:
df <- read_table("200610-1 rs28619217
200610-10 NA
200610-100 rs367572771
200610-102 rs144402189
200610-105 rs375896687
200610-107 NA
200610-108 NA
200610-109 NA
200610-110 rs199838004
200610-111 rs374875201
200610-112 NA
200610-113 rs377546596
200610-114 NA
200610-115 NA
200610-116 NA", col_names = c("col1", "col2"), col_types = "cc")
df %>%
mutate(fill = ifelse(is.na(col2), col1, col2))
# A tibble: 15 × 3
col1 col2 fill
<chr> <chr> <chr>
1 200610-1 rs28619217 rs28619217
2 200610-10 <NA> 200610-10
3 200610-100 rs367572771 rs367572771
4 200610-102 rs144402189 rs144402189
5 200610-105 rs375896687 rs375896687
6 200610-107 <NA> 200610-107
7 200610-108 <NA> 200610-108
8 200610-109 <NA> 200610-109
9 200610-110 rs199838004 rs199838004
10 200610-111 rs374875201 rs374875201
11 200610-112 <NA> 200610-112
12 200610-113 rs377546596 rs377546596
13 200610-114 <NA> 200610-114
14 200610-115 <NA> 200610-115
15 200610-116 <NA> 200610-116
答案 3 :(得分:1)
df = df[! is.na(df[,2])]
df[,3]= paste0(df[,1], df[,2])
df = df[ unique(df[,3]), ]
df = df[,3]
有效吗?
df =
df %>%
mutate(fill = ifelse(is.na(col2), col1, col2)) %>%
unique(df$col1)
有效吗?