通过选择两列信息来组合新列

时间:2017-03-03 01:56:44

标签: r if-statement dataframe duplicates apply

我有一个包含两列信息的数据框,我想创建一个基于第二列的新列,并选择不包含NA值的内容,如果它是重复的,将选择第一列。

df:
200610-1    rs28619217
200610-10   NA
200610-100  rs367572771
200610-102  rs144402189
200610-105  rs375896687
200610-107  NA
200610-108  NA
200610-109  NA
200610-110  rs199838004
200610-111  rs374875201
200610-112  NA
200610-113  rs377546596
200610-114  NA
200610-115  NA
200610-116  NA
200610-117  rs67858721
200610-118  rs67858721
200610-119  rs9876735
200610-120  rs9876735

desired output:
200610-1    rs28619217  rs28619217
200610-10   NA          200610-10
200610-100  rs367572771 rs367572771
200610-102  rs144402189 rs144402189
200610-105  rs375896687 rs375896687
200610-107  NA          200610-107
200610-108  NA          200610-108
200610-109  NA          200610-109
200610-110  rs199838004 rs199838004
200610-111  rs374875201 rs374875201
200610-112  NA          200610-112
200610-113  rs377546596 rs377546596
200610-114  NA          200610-114
200610-115  NA          200610-115
200610-116  NA          200610-116
200610-117  rs67858721  rs67858721
200610-118  rs67858721  200610-118
200610-119  rs9876735   rs9876735
200610-120  rs9876735   200610-120

我应该一步一步做什么?我正在考虑使用apply函数。

4 个答案:

答案 0 :(得分:1)

考虑下面的变体......

pinMode(motor_l_u, OUTPUT);
pinMode(motor_l_v, OUTPUT);
pinMode(motor_r_u, OUTPUT);
pinMode(motor_r_v, OUTPUT);
digitalWrite(motor_l_u, LOW); // at start turn off the GPIO
digitalWrite(motor_l_v, LOW); // at start turn off the GPIO
digitalWrite(motor_r_u, LOW); // at start turn off the GPIO
digitalWrite(motor_r_v, LOW); // at start turn off the GPIO
pinMode(motor_l_u, PWM_OUTPUT);
pinMode(motor_l_v, PWM_OUTPUT);
pinMode(motor_r_u, PWM_OUTPUT);
pinMode(motor_r_v, PWM_OUTPUT);

答案 1 :(得分:1)

我们可以使用ifelse

df1$Col3 <- with(df1, ifelse(is.na(Col2), Col1, Col2))
df1$Col3
#[1] "rs28619217"  "200610-10"   "rs367572771" "rs144402189" "rs375896687"
#[6] "200610-107"  "200610-108"  "200610-109"  "rs199838004" "rs374875201"
#[11] "200610-112"  "rs377546596" "200610-114"  "200610-115"  "200610-116" 

更新

如果有重复项,如评论中提到的@Sotos,我们可以在duplicated

内创建一个ifelse的逻辑向量
with(df1, ifelse(is.na(Col2)|duplicated(Col2), Col1, Col2))

答案 2 :(得分:1)

mutate和ifelse语句将完成工作:

df <- read_table("200610-1    rs28619217
200610-10   NA
200610-100  rs367572771
200610-102  rs144402189
200610-105  rs375896687
200610-107  NA
200610-108  NA
200610-109  NA
200610-110  rs199838004
200610-111  rs374875201
200610-112  NA
200610-113  rs377546596
200610-114  NA
200610-115  NA
200610-116  NA", col_names = c("col1", "col2"), col_types = "cc")

df %>% 
  mutate(fill = ifelse(is.na(col2), col1, col2))

# A tibble: 15 × 3
         col1        col2        fill
        <chr>       <chr>       <chr>
1    200610-1  rs28619217  rs28619217
2   200610-10        <NA>   200610-10
3  200610-100 rs367572771 rs367572771
4  200610-102 rs144402189 rs144402189
5  200610-105 rs375896687 rs375896687
6  200610-107        <NA>  200610-107
7  200610-108        <NA>  200610-108
8  200610-109        <NA>  200610-109
9  200610-110 rs199838004 rs199838004
10 200610-111 rs374875201 rs374875201
11 200610-112        <NA>  200610-112
12 200610-113 rs377546596 rs377546596
13 200610-114        <NA>  200610-114
14 200610-115        <NA>  200610-115
15 200610-116        <NA>  200610-116

答案 3 :(得分:1)

df = df[! is.na(df[,2])]
df[,3]= paste0(df[,1], df[,2])
df = df[ unique(df[,3]), ]
df = df[,3]

有效吗?

df =
df %>% 
mutate(fill = ifelse(is.na(col2), col1, col2)) %>%
unique(df$col1)

有效吗?