Question

I'm trying to 're-count' a column in R and having issues by cleaning up the data. I'm working on cleaning data by location and once I change CA to California.

 all_location <- read.csv("all_location.csv", stringsAsFactors = FALSE)
 all_location <- count(all_location, location)
 all_location <- all_location[with(all_location, order(-n)), ]

  all_location

   A tibble: 100 x 2
    location        n
   <chr>       <int>
  1 CA           3216
  2 Alaska       2985
 3 Nevada        949
 4 Washington    253
 5 Hawaii        239
 6 Montana       218
 7 Puerto Rico   149
 8 California    126
 9 Utah           83
10 NA             72

From the above, there's CA and California. Below I'm able to clean grep and replace CA with California. However, my issue is that it's grouping by California but shows two separate instances of California.

  ca1 <- grep("CA",all_location$location)
  all_location$location <- replace(all_location$location,ca1,"California")

 all_location

A tibble: 100 x 2
 location        n
<chr>       <int>
 1 California   3216
 2 Alaska       2985
 3 Nevada        949
 4 Washington    253
 5 Hawaii        239
 6 Montana       218
 7 Puerto Rico   149
 8 California    126
 9 Utah           83
 10 NA             72

My goal would be to combine both to a total under n.

Answer 1

all_location$location[substr(all_location$location, 1, 5) %in% "Calif" ] <- "California"

to make sure everything that starts with "Calif" gets made into "California"

I am assuming that maybe you have a space in the California (e.g. "California ") that is already present which is why this is happening..

Grouping and/or Counting in R

1 个答案: