有条件地替换R中的分类值

时间:2016-06-19 00:20:50

标签: r replace categorical-data

我是使用R的新手,并且一直在咨询此论坛以了解我以前的一些R问题。但是,我似乎无法找到目前的答案。

我有一个包含多列的大数据集。我想根据另一列中的值替换一列中的某些值。这是一个示例:

organization                                    organization_type                
[1,] "Human Relief Foundation"                  "NGO"                            
[2,] "Management Systems International"         "Other"                          
[3,] "World Vision"                             "NGO"                            
[4,] "European Disaster Volunteers"             "NGO"                            
[5,] "Management Systems International"         "Other"                          
[6,] "International Committee of the Red Cross" "Red Cross/Red Crescent Movement"
[7,] "International Committee of the Red Cross" "Red Cross/Red Crescent Movement"
[8,] "Development Alternatives"                 "Consultancy"                    

上述数据集显示了“管理系统国际”值的organization_type下的“其他”。我想用“咨询”代替“其他”。我怎么能这样做?

我按照另一个论坛的建议尝试了以下内容,但它只保留了过滤后的数据:

library(dplyr)

data_df <- data_df %>% filter(organization == "Management Systems International" 
           & organization_type == "Other") %>%  
           mutate(organization_type = "Consultancy")

有没有办法在R中“过滤”数据,还有原始数据条目和过滤后的数据? Excel可以做到这一点,但很难在Excel中处理大数据集。

谢谢!

3 个答案:

答案 0 :(得分:2)

使用async_read包,您可以按照以下步骤执行此操作:

data.table

结果:

# Install if necessary
if (!require("data.table")) install.packages("data.table")
# Load the data.table package
library(data.table)

# Convert data_df to a data.table 
data_dt <- data.table(data_df) %>%
  # Where organization_type equals 'Other', replace organization_type to 'Consultancy'
  .[organization_type == "Other", organization_type := "Consultancy"]

# Print result
print(data_dt)

答案 1 :(得分:1)

使用dplyr,

data_df %>% mutate(organization_type = ifelse(
      organization == "Management Systems International",
      "Consultancy",
      organization_type))

                              organization               organization_type
1                  Human Relief Foundation                             NGO
2         Management Systems International                     Consultancy
3                             World Vision                             NGO
4             European Disaster Volunteers                             NGO
5         Management Systems International                     Consultancy
6 International Committee of the Red Cross Red Cross/Red Crescent Movement
7 International Committee of the Red Cross Red Cross/Red Crescent Movement
8                 Development Alternatives                     Consultancy

答案 2 :(得分:0)

您可以使用包stringrifelse

您的数据(用于说明目的的子集)。

a <- c("Human Relief Foundation", "Management Systems International", "World Vision", "World Vision")   
b <- c("NGO", "Other", "NGO", "Other")
df <- as.data.frame(cbind(a,b))
df
#                                 a     b
#1          Human Relief Foundation   NGO
#2 Management Systems International Other
#3                     World Vision   NGO
#4                     World Vision Other

然后替换数据的特定部分。

library(stringr)
df$b <- ifelse(df$a=="Management Systems International", 
   str_replace(as.character(df$b), "Other", "Consultancy"), as.character(df$b))
df
#                                 a           b
#1          Human Relief Foundation         NGO
#2 Management Systems International Consultancy
#3                     World Vision         NGO
#4                     World Vision       Other