我如何从R中的数据集中分割特定列数据

时间:2018-05-27 16:16:54

标签: r

我在客户端数据集下面包含client_id,birth_number和district_id。出生编号的格式为YYMMDD,此处为扭曲 - 值的格式为:YYMMDD(适用于男士),值的格式为:YY(+50MM)DD(适用于女性) 。我希望你的帮助在R中开发脚本,我们可以在其中分割YYMMDD并设置条件。根据条件MM>12然后该行属于女性而实际月份值减去15除了具有相同出生数的男性。 请帮忙

值的格式为:YYMMDD(适用于男性) 该值的格式为:YY(+50MM)DD(适用于女性)

"client_id";"birth_number";"district_id"
1;"706213";18
2;"450204";1
3;"406009";1
4;"561201";5
5;"605703";5
6;"190922";12
7;"290125";15
8;"385221";51
9;"351016";60
10;"430501";57
11;"505822";57
12;"810220";40
13;"745529";54
14;"425622";76
15;"185828";21
16;"190225";21
17;"341013";76
18;"315405";76
19;"421228";47
20;"790104";46
21;"526029";12
22;"696011";1
23;"730529";1
24;"395729";43
25;"395423";21
26;"695420";74
27;"665326";54
28;"450929";1
29;"515911";30
30;"576009";74
31;"620209";68
32;"800728";52
33;"486204";73

2 个答案:

答案 0 :(得分:2)

选项是将substringifelse一起使用:

# Get the 3rd and 4th character from "birth_number". If it is > 12
# that row is for Female, otherwise Male

df$Gender <- ifelse(as.numeric(substring(df$birth_number,3,4)) > 12, "Female", "Male")

# Now correct the "birth_number". Subtract 50 form middle 2 digits.
# Updated based on feedback from @RuiBarradas to use df$Gender == "Female" 
# to subtract 50 from month number

df$birth_number <- ifelse(df$Gender == "Female", 
                          as.character(as.numeric(df$birth_number)-5000), df$birth_number)

df

#    client_id birth_number district_id Gender
# 1          1       701213          18 Female
# 2          2       450204           1   Male
# 3          3       401009           1 Female
# 4          4       561201           5   Male
# 5          5       600703           5 Female
# 6          6       190922          12   Male
# so on
#

数据:

df <- read.table(text = 
'"client_id";"birth_number";"district_id"
1;"706213";18
2;"450204";1
3;"406009";1
4;"561201";5
5;"605703";5
6;"190922";12
7;"290125";15
8;"385221";51
9;"351016";60
10;"430501";57
11;"505822";57
12;"810220";40
13;"745529";54
14;"425622";76
15;"185828";21
16;"190225";21
17;"341013";76
18;"315405";76
19;"421228";47
20;"790104";46
21;"526029";12
22;"696011";1
23;"730529";1
24;"395729";43
25;"395423";21
26;"695420";74
27;"665326";54
28;"450929";1
29;"515911";30
30;"576009";74
31;"620209";68
32;"800728";52
33;"486204";73',
header = TRUE, stringsAsFactors = FALSE, sep = ";")

答案 1 :(得分:0)

使用与@MKR相同的命令,我更喜欢tidyverse方法。

需要(tidyverse)

df %>% 
  mutate(Gender = ifelse(substr(birth_number, 3, 4) > 12, 
                         "Female", "Male"), 
         birth_number = ifelse(Gender == "Female", 
                               birth_number - 5000, 
                               birth_number)) 


   client_id birth_number district_id Gender
1          1       701213          18 Female
2          2       450204           1   Male
3          3       401009           1 Female
4          4       561201           5   Male
5          5       600703           5 Female
6          6       190922          12   Male
7          7       290125          15   Male
8          8       380221          51 Female
9          9       351016          60   Male
10        10       430501          57   Male
11        11       500822          57 Female
12        12       810220          40   Male
13        13       740529          54 Female
14        14       420622          76 Female
15        15       180828          21 Female
16        16       190225          21   Male
17        17       341013          76   Male
18        18       310405          76 Female
19        19       421228          47   Male
20        20       790104          46   Male
21        21       521029          12 Female
22        22       691011           1 Female
23        23       730529           1   Male
24        24       390729          43 Female
25        25       390423          21 Female
26        26       690420          74 Female
27        27       660326          54 Female
28        28       450929           1   Male
29        29       510911          30 Female
30        30       571009          74 Female
31        31       620209          68   Male
32        32       800728          52   Male
33        33       481204          73 Female