我是使用R的新手,并且一直在咨询此论坛以了解我以前的一些R问题。但是,我似乎无法找到目前的答案。
我有一个包含多列的大数据集。我想根据另一列中的值替换一列中的某些值。这是一个示例:
organization organization_type
[1,] "Human Relief Foundation" "NGO"
[2,] "Management Systems International" "Other"
[3,] "World Vision" "NGO"
[4,] "European Disaster Volunteers" "NGO"
[5,] "Management Systems International" "Other"
[6,] "International Committee of the Red Cross" "Red Cross/Red Crescent Movement"
[7,] "International Committee of the Red Cross" "Red Cross/Red Crescent Movement"
[8,] "Development Alternatives" "Consultancy"
上述数据集显示了“管理系统国际”值的organization_type下的“其他”。我想用“咨询”代替“其他”。我怎么能这样做?
我按照另一个论坛的建议尝试了以下内容,但它只保留了过滤后的数据:
library(dplyr)
data_df <- data_df %>% filter(organization == "Management Systems International"
& organization_type == "Other") %>%
mutate(organization_type = "Consultancy")
有没有办法在R中“过滤”数据,还有原始数据条目和过滤后的数据? Excel可以做到这一点,但很难在Excel中处理大数据集。
谢谢!
答案 0 :(得分:2)
使用async_read
包,您可以按照以下步骤执行此操作:
data.table
结果:
# Install if necessary
if (!require("data.table")) install.packages("data.table")
# Load the data.table package
library(data.table)
# Convert data_df to a data.table
data_dt <- data.table(data_df) %>%
# Where organization_type equals 'Other', replace organization_type to 'Consultancy'
.[organization_type == "Other", organization_type := "Consultancy"]
# Print result
print(data_dt)
答案 1 :(得分:1)
使用dplyr,
data_df %>% mutate(organization_type = ifelse(
organization == "Management Systems International",
"Consultancy",
organization_type))
organization organization_type
1 Human Relief Foundation NGO
2 Management Systems International Consultancy
3 World Vision NGO
4 European Disaster Volunteers NGO
5 Management Systems International Consultancy
6 International Committee of the Red Cross Red Cross/Red Crescent Movement
7 International Committee of the Red Cross Red Cross/Red Crescent Movement
8 Development Alternatives Consultancy
答案 2 :(得分:0)
您可以使用包stringr
和ifelse
。
您的数据(用于说明目的的子集)。
a <- c("Human Relief Foundation", "Management Systems International", "World Vision", "World Vision")
b <- c("NGO", "Other", "NGO", "Other")
df <- as.data.frame(cbind(a,b))
df
# a b
#1 Human Relief Foundation NGO
#2 Management Systems International Other
#3 World Vision NGO
#4 World Vision Other
然后替换数据的特定部分。
library(stringr)
df$b <- ifelse(df$a=="Management Systems International",
str_replace(as.character(df$b), "Other", "Consultancy"), as.character(df$b))
df
# a b
#1 Human Relief Foundation NGO
#2 Management Systems International Consultancy
#3 World Vision NGO
#4 World Vision Other