将一个数据集中的列中的数值隐蔽为不同数据集中的对应文本值

时间:2019-05-18 00:11:40

标签: r merge dplyr left-join

我有两个数据集,一个具有全数值,而另一个具有转换:

这是实时数据集的示例,它具有多于2列,但在这种情况下为191是USA Example of Datset

这是转换数据集的示例。请注意,有时列的长度不同。例如,国家有200个元素,而种族有6个元素。

Example of Conversion dataset

我如何编写将出生国家的值更改为美国的代码-191转换为美国?原始数据集称为my_data,而转换数据集称为Coversion。

I have tried left_joins, 
my_data <- left_join(my_data, select(Conversion, c("StateGrewUpIn", "State")), by = "StateGrewUpIn")

I have tried merge:
    my_data <- merge(x = my_data, y = Conversion[ , c("StateGrewUpIn", "State")], by = "StateGrewUpIn", all.x=TRUE)

似乎没有任何作用,它总是复制超出原始数据集中最大行的行。换句话说,它不仅仅是复制干净的vlookup,而是复制行或包含Conversion数据集中的行。

Conversion Dataset
> str(Conversion)
'data.frame':   200 obs. of  20 variables:
 $ Religion                : int  1 2 3 4 5 6 7 8 NA NA ...
 $ Rel                     : chr  "Protestant" "Catholic" "Islam" "Judaism" ...
 $ PoliticalView           : int  1 2 3 4 5 6 7 NA NA NA ...
 $ Political_views         : chr  "Very Progressive/Left-wing" "Progressive/Left-wing" "Somewhat Progressive/Left-wing" "Moderate/Centrist" ...
 $ CountryofBirth          : int  1 2 3 4 5 6 7 8 9 10 ...
 $ Country                 : chr  "Afghanistan" "Albania" "Algeria" "Andorra" ...
 $ Citizenship             : int  1 2 3 4 5 6 7 8 9 10 ...
 $ Citizen                 : chr  "Afghanistan" "Albania" "Algeria" "Andorra" ...
 $ StateGrewUpIn           : int  1 2 3 4 5 6 7 8 9 10 ...
 $ State                   : chr  "Alaska (AK)" "American Samoa (AS)" "Arizona (AZ)" "Arkansas (AR)" ...
 $ Ethnicity               : int  1 2 3 4 5 6 NA NA NA NA ...
 $ Ethnic                  : chr  "White" "Asian" "Latino" "Black" ...
 $ Education               : int  1 2 3 4 5 6 NA NA NA NA ...
 $ Education_level         : chr  "Some high school/secondary school" "High school degree/completed secondary school" "Some university" "University degree" ...
 $ YearlyIncome            : int  1 2 3 4 5 6 7 NA NA NA ...
 $ Income                  : chr  "Less than $10,000 USD a year" "USD $10,000-$20,000" "USD $20,000-$40,000" "USD USD $40,000-$60,000" ...
 $ HighestEducationPar     : int  1 2 3 4 5 6 NA NA NA NA ...
 $ Parent_Highest_Education: chr  "Some high school/secondary school" "High school degree/completed secondary school" "Some university" "University degree" ...
 $ Atten_check_ans_1       : int  1 2 3 4 5 NA NA NA NA NA ...
 $ Attention_Check         : chr  "strongly disagree" "moderately disagree" "neither disagree nor agree" "moderately agree" ...



Example of large dataset (note it has >200 columns so I just took the mentioned example above).
> str(my_data)
'data.frame':   35 obs. of  228 variables:
 $ Citizenship                     : chr  "144" "191" "191" "191" ...
 $ CountryofBirth                  : chr  "144" "191" "191" "191" ...
 $ StartDate                       : chr  "2019-05-17 13:49:35" "2019-05-17 12:54:30" "2019-05-17 12:54:40" "2019-05-17 12:54:20" ...
 $ EndDate                         : chr  "2019-05-17 14:00:12" "2019-05-17 13:00:21" "2019-05-17 13:02:02" "2019-05-17 13:04:25" ...
 $ Status                          : chr  "0" "0" "0" "0" ...

0 个答案:

没有答案