我有两个数据集,一个具有全数值,而另一个具有转换:
这是实时数据集的示例,它具有多于2列,但在这种情况下为191是USA
这是转换数据集的示例。请注意,有时列的长度不同。例如,国家有200个元素,而种族有6个元素。
我如何编写将出生国家的值更改为美国的代码-191转换为美国?原始数据集称为my_data,而转换数据集称为Coversion。
I have tried left_joins,
my_data <- left_join(my_data, select(Conversion, c("StateGrewUpIn", "State")), by = "StateGrewUpIn")
I have tried merge:
my_data <- merge(x = my_data, y = Conversion[ , c("StateGrewUpIn", "State")], by = "StateGrewUpIn", all.x=TRUE)
似乎没有任何作用,它总是复制超出原始数据集中最大行的行。换句话说,它不仅仅是复制干净的vlookup,而是复制行或包含Conversion数据集中的行。
Conversion Dataset
> str(Conversion)
'data.frame': 200 obs. of 20 variables:
$ Religion : int 1 2 3 4 5 6 7 8 NA NA ...
$ Rel : chr "Protestant" "Catholic" "Islam" "Judaism" ...
$ PoliticalView : int 1 2 3 4 5 6 7 NA NA NA ...
$ Political_views : chr "Very Progressive/Left-wing" "Progressive/Left-wing" "Somewhat Progressive/Left-wing" "Moderate/Centrist" ...
$ CountryofBirth : int 1 2 3 4 5 6 7 8 9 10 ...
$ Country : chr "Afghanistan" "Albania" "Algeria" "Andorra" ...
$ Citizenship : int 1 2 3 4 5 6 7 8 9 10 ...
$ Citizen : chr "Afghanistan" "Albania" "Algeria" "Andorra" ...
$ StateGrewUpIn : int 1 2 3 4 5 6 7 8 9 10 ...
$ State : chr "Alaska (AK)" "American Samoa (AS)" "Arizona (AZ)" "Arkansas (AR)" ...
$ Ethnicity : int 1 2 3 4 5 6 NA NA NA NA ...
$ Ethnic : chr "White" "Asian" "Latino" "Black" ...
$ Education : int 1 2 3 4 5 6 NA NA NA NA ...
$ Education_level : chr "Some high school/secondary school" "High school degree/completed secondary school" "Some university" "University degree" ...
$ YearlyIncome : int 1 2 3 4 5 6 7 NA NA NA ...
$ Income : chr "Less than $10,000 USD a year" "USD $10,000-$20,000" "USD $20,000-$40,000" "USD USD $40,000-$60,000" ...
$ HighestEducationPar : int 1 2 3 4 5 6 NA NA NA NA ...
$ Parent_Highest_Education: chr "Some high school/secondary school" "High school degree/completed secondary school" "Some university" "University degree" ...
$ Atten_check_ans_1 : int 1 2 3 4 5 NA NA NA NA NA ...
$ Attention_Check : chr "strongly disagree" "moderately disagree" "neither disagree nor agree" "moderately agree" ...
Example of large dataset (note it has >200 columns so I just took the mentioned example above).
> str(my_data)
'data.frame': 35 obs. of 228 variables:
$ Citizenship : chr "144" "191" "191" "191" ...
$ CountryofBirth : chr "144" "191" "191" "191" ...
$ StartDate : chr "2019-05-17 13:49:35" "2019-05-17 12:54:30" "2019-05-17 12:54:40" "2019-05-17 12:54:20" ...
$ EndDate : chr "2019-05-17 14:00:12" "2019-05-17 13:00:21" "2019-05-17 13:02:02" "2019-05-17 13:04:25" ...
$ Status : chr "0" "0" "0" "0" ...