我在客户端数据集下面包含client_id,birth_number和district_id。出生编号的格式为YYMMDD
,此处为扭曲 - 值的格式为:YYMMDD
(适用于男士),值的格式为:YY(+50MM)DD
(适用于女性) 。我希望你的帮助在R中开发脚本,我们可以在其中分割YYMMDD
并设置条件。根据条件MM>12
然后该行属于女性而实际月份值减去15除了具有相同出生数的男性。
请帮忙
值的格式为:YYMMDD
(适用于男性)
该值的格式为:YY(+50MM)DD
(适用于女性)
"client_id";"birth_number";"district_id"
1;"706213";18
2;"450204";1
3;"406009";1
4;"561201";5
5;"605703";5
6;"190922";12
7;"290125";15
8;"385221";51
9;"351016";60
10;"430501";57
11;"505822";57
12;"810220";40
13;"745529";54
14;"425622";76
15;"185828";21
16;"190225";21
17;"341013";76
18;"315405";76
19;"421228";47
20;"790104";46
21;"526029";12
22;"696011";1
23;"730529";1
24;"395729";43
25;"395423";21
26;"695420";74
27;"665326";54
28;"450929";1
29;"515911";30
30;"576009";74
31;"620209";68
32;"800728";52
33;"486204";73
答案 0 :(得分:2)
选项是将substring
与ifelse
一起使用:
# Get the 3rd and 4th character from "birth_number". If it is > 12
# that row is for Female, otherwise Male
df$Gender <- ifelse(as.numeric(substring(df$birth_number,3,4)) > 12, "Female", "Male")
# Now correct the "birth_number". Subtract 50 form middle 2 digits.
# Updated based on feedback from @RuiBarradas to use df$Gender == "Female"
# to subtract 50 from month number
df$birth_number <- ifelse(df$Gender == "Female",
as.character(as.numeric(df$birth_number)-5000), df$birth_number)
df
# client_id birth_number district_id Gender
# 1 1 701213 18 Female
# 2 2 450204 1 Male
# 3 3 401009 1 Female
# 4 4 561201 5 Male
# 5 5 600703 5 Female
# 6 6 190922 12 Male
# so on
#
数据:
df <- read.table(text =
'"client_id";"birth_number";"district_id"
1;"706213";18
2;"450204";1
3;"406009";1
4;"561201";5
5;"605703";5
6;"190922";12
7;"290125";15
8;"385221";51
9;"351016";60
10;"430501";57
11;"505822";57
12;"810220";40
13;"745529";54
14;"425622";76
15;"185828";21
16;"190225";21
17;"341013";76
18;"315405";76
19;"421228";47
20;"790104";46
21;"526029";12
22;"696011";1
23;"730529";1
24;"395729";43
25;"395423";21
26;"695420";74
27;"665326";54
28;"450929";1
29;"515911";30
30;"576009";74
31;"620209";68
32;"800728";52
33;"486204";73',
header = TRUE, stringsAsFactors = FALSE, sep = ";")
答案 1 :(得分:0)
使用与@MKR相同的命令,我更喜欢tidyverse
方法。
需要(tidyverse)
df %>%
mutate(Gender = ifelse(substr(birth_number, 3, 4) > 12,
"Female", "Male"),
birth_number = ifelse(Gender == "Female",
birth_number - 5000,
birth_number))
client_id birth_number district_id Gender
1 1 701213 18 Female
2 2 450204 1 Male
3 3 401009 1 Female
4 4 561201 5 Male
5 5 600703 5 Female
6 6 190922 12 Male
7 7 290125 15 Male
8 8 380221 51 Female
9 9 351016 60 Male
10 10 430501 57 Male
11 11 500822 57 Female
12 12 810220 40 Male
13 13 740529 54 Female
14 14 420622 76 Female
15 15 180828 21 Female
16 16 190225 21 Male
17 17 341013 76 Male
18 18 310405 76 Female
19 19 421228 47 Male
20 20 790104 46 Male
21 21 521029 12 Female
22 22 691011 1 Female
23 23 730529 1 Male
24 24 390729 43 Female
25 25 390423 21 Female
26 26 690420 74 Female
27 27 660326 54 Female
28 28 450929 1 Male
29 29 510911 30 Female
30 30 571009 74 Female
31 31 620209 68 Male
32 32 800728 52 Male
33 33 481204 73 Female