通过as.numeric将.tsv列强制转换为数字时,如何避免使用“ NA”值?

时间:2019-06-05 21:04:40

标签: r csv formatting

我有一个数据框,其中包含来自.tsv文件的几列,并希望将其中之一转换为“数字”类型以进行分析。但是,我一直都在通过强制警告不断引入“ NA”,并且不知道确切的原因。在另一列的开头有一些不必要的信息,这几乎是我所做的唯一格式化。

最初,我认为该文件可能已添加了一些额外的制表符或空格,这就是为什么我尝试通过给sub()作为参数来删除它们。

我还应该提到,当我不替换值并按原样运行数据框时,也会出现NA错误:

library(tidyverse)

data_2018 <- read_tsv('teina230.tsv')
data_1995 <- read_csv('OECD_1995.csv')

#get rid of long colname & select only columns containing %GDP
clean_data_2018 <- data_2018 %>%
  select('na_item,sector,unit,geo','2018Q1','2018Q2','2018Q3','2018Q4') %>%
  rename(country = 'na_item,sector,unit,geo')
clean_data_2018 <- clean_data_2018[grep("PC_GDP", clean_data_2018$'country'), ]

#remove unnecessary info
clean_data_2018 <- clean_data_2018 %>%
  mutate(country=gsub('\\GD,S13,PC_GDP,','',country))
clean_data_2018 <- clean_data_2018 %>%
  mutate(
    '2018Q1'=as.numeric(sub("", "", '2018Q1', fixed = TRUE)),
    '2018Q2'=as.numeric(sub(" ", "", '2018Q2', fixed = TRUE)),
    '2018Q3'=as.numeric(sub(" ", "", '2018Q3', fixed = TRUE)),
    '2018Q4'=as.numeric(sub(" ", "", '2018Q4', fixed = TRUE))
    )

还有另一种方法可以解决该问题并转换列而不用'NA'代替所有值吗?

谢谢大家:)

1 个答案:

答案 0 :(得分:0)

感谢提示@divibisan!

通过rename()重命名列实际上解决了这个问题。这里的代码终于起作用了:

library(tidyverse)

data_2018 <- read_tsv('teina230.tsv')

#get rid of long colname & select only columns containing %GDP
clean_data_2018 <- data_2018 %>%
  select('na_item,sector,unit,geo','2018Q1','2018Q2','2018Q3','2018Q4') %>%
  rename(country = 'na_item,sector,unit,geo',
         quarter_1 = '2018Q1',
         quarter_2 = '2018Q2',
         quarter_3 = '2018Q3',
         quarter_4 = '2018Q4')
clean_data_2018 <- clean_data_2018[grep("PC_GDP", clean_data_2018$'country'), ]

#remove unnecessary info
clean_data_2018 <- clean_data_2018 %>%
  mutate(country=gsub('\\GD,S13,PC_GDP,','',country))
clean_data_2018 <- clean_data_2018 %>%
  mutate(
    quarter_1 = as.numeric(quarter_1),
    quarter_2 = as.numeric(quarter_2),
    quarter_3 = as.numeric(quarter_3),
    quarter_4 = as.numeric(quarter_4)
    )