给定数据帧中某个变量的频率,如何更改该变量在数据帧行中的名称?

时间:2018-10-23 18:21:45

标签: r

我在电影的data.frame中有一个变量(distributor,format = factor)。我想用少于10次的所有分销商名称来代替“小型公司”。我能够拿出一个清单并使用

进行计数
import pandas as pd

file = "file.csv"
df = pd.read_csv(file)
pd.options.display.max_columns = len(df.columns)
print(df)

但是我无法在我的data.frame中进行替换。

1 个答案:

答案 0 :(得分:1)

这是使用dplyr的解决方案。

library(dplyr)

## make some dummy data
df <- tribble(
     ~distributor, ~something,
     "dist1", 89,
     "dist2", 92,
     "dist3", 29,
     "dist1", 89
)


df %>% 
     group_by(distributor) %>% 
     ## this counts the number of occurences of each distributor
     mutate(occurrences = n()) %>% 
     ungroup() %>% 
     ## change the name of the distributor if the occurrences are less than 2
     mutate(distributor = ifelse(occurrences < 2, "small company", distributor))