将增量数字添加到.csv文件中的列的最佳方法

时间:2018-12-03 18:40:32

标签: r csv tableau

我有一个数据集(以.csv文件的形式),其中包含许多列,其中一列包含(电视节目的)“流派”。有多个列(一个用于节目标题,一个用于剧集编号,一个用于剧情简介,等等。)我想创建一个新列,该列连续为“流派”的每个条目编号。例如。因此,纪录片的第一个实例应后跟“ 1”,第二个条目后应跟“ 2”,依此类推。然后,当有新类型时,应从“ 1”开始。如果不清楚,这就是我的意思:

Documentary, 1
Documentary, 2 
Documentary, 3
Documentary, 4
Drama, 1
Drama, 2
Drama, 3
Drama, 4
Drama, 5
Sport, 1
Sport, 2
Sport, 3

在有意义的情况下,流派出现的次数会有所不同。我还需要将其应用到数百个.csv文件,因此手动添加此数据不是一种选择!

我想知道是否有人可以建议我该怎么做?我不是最了解数据的人,因此欢迎您使用简单的方法!我对R有所了解,并怀疑您可以通过编写一个包含if / else循环的脚本来做到这一点(例如,如果下一个字段包含与上一个字段相同的内容,请添加1否则从1开始-不好意思的语法,但是您会得到这个想法!)我正在Tableau中可视化此数据,并注意到他们现在有了Tableau Prep-也许可以在其中完成?欢迎任何解决方案!

2 个答案:

答案 0 :(得分:1)

在R中有多种方法可以实现。这是使用tidyverse软件包套件中的函数的一种方法。我们首先按流派分组,然后添加一列,该列从1到流派的脚本数量之间进行计数。根据您的需要,我为新列的外观提供了两种选择。

library(tidyverse)

# Fake data
set.seed(2)
dat = data.frame(genre = sample(c("Drama", "Comedy", "Sport", "Documentary"), 20, replace=TRUE))

# Add columns to number scripts within each genre
dat = dat %>% 
  group_by(genre) %>% 
  mutate(count = 1:n(),
         count2 = paste0(genre, ", ", 1:n()))

dat
   genre       count count2                
 1 Drama           1 Drama, 1      
 2 Sport           1 Sport, 1      
 3 Sport           2 Sport, 2      
 4 Drama           2 Drama, 2      
 5 Documentary     1 Documentary, 1
 6 Documentary     2 Documentary, 2
 7 Drama           3 Drama, 3      
 8 Documentary     3 Documentary, 3
 9 Comedy          1 Comedy, 1     
10 Sport           3 Sport, 3      
11 Sport           4 Sport, 4      
12 Drama           4 Drama, 4      
13 Documentary     4 Documentary, 4
14 Drama           5 Drama, 5      
15 Comedy          2 Comedy, 2     
16 Documentary     5 Documentary, 5
17 Documentary     6 Documentary, 6
18 Drama           6 Drama, 6      
19 Comedy          3 Comedy, 3     
20 Drama           7 Drama, 7

如果您希望对数据进行排序,可以这样做,例如:

dat %>% arrange(genre, count)
   genre       count count2             
 1 Comedy          1 Comedy, 1     
 2 Comedy          2 Comedy, 2     
 3 Comedy          3 Comedy, 3     
 4 Documentary     1 Documentary, 1
 5 Documentary     2 Documentary, 2
 6 Documentary     3 Documentary, 3
 7 Documentary     4 Documentary, 4
 8 Documentary     5 Documentary, 5
 9 Documentary     6 Documentary, 6
10 Drama           1 Drama, 1      
11 Drama           2 Drama, 2      
12 Drama           3 Drama, 3      
13 Drama           4 Drama, 4      
14 Drama           5 Drama, 5      
15 Drama           6 Drama, 6      
16 Drama           7 Drama, 7      
17 Sport           1 Sport, 1      
18 Sport           2 Sport, 2      
19 Sport           3 Sport, 3      
20 Sport           4 Sport, 4

答案 1 :(得分:1)

library(dplyr)
library(tidyr)

df <- data.frame(genre = c("Documentary", "Documentary", "Documentary", "Sport", "Sport", "Drama"), rating = c(2,2,4,4,6,6))
df %>% group_by(genre) %>% mutate(id = row_number()) %>% unite(genre_number, c("genre", "id"), sep = " ")

    # A tibble: 6 x 2
  genre_number  rating
  <chr>          <dbl>
1 Documentary 1      2
2 Documentary 2      2
3 Documentary 3      4
4 Sport 1            4
5 Sport 2            6
6 Drama 1            6

编辑:要处理批处理文件,您可以使任何功能生效并将其应用到文件列表中。

library(dplyr)
library(tidyr)

number_genres <- function(x) {
x %>% 
  group_by(genre) %>%
  mutate(id = row_number()) %>%
  unite(genre_number, c("genre", "id"), sep = " ")
}

dir <- "C:/Documents/test" #location of your .csv files
filenames <- list.files(path = dir, pattern = "*.csv", full.names = FALSE) # gets your file names
data_list <- lapply(filenames, read.csv) # reads your files
names(data_list) <- filenames #names your list with respective csv names
numbered <- lapply(data_list, number_genres) # apply your function to your data_list

lapply(1:length(numbered), function(i) write.csv(numbered[[i]], 
                                                file = paste0(names(numbered[i])),
                                                row.names = FALSE)) #writes the data to .csv