我有一个数据框,其中包含来自.tsv文件的几列,并希望将其中之一转换为“数字”类型以进行分析。但是,我一直都在通过强制警告不断引入“ NA”,并且不知道确切的原因。在另一列的开头有一些不必要的信息,这几乎是我所做的唯一格式化。
最初,我认为该文件可能已添加了一些额外的制表符或空格,这就是为什么我尝试通过给sub()作为参数来删除它们。
我还应该提到,当我不替换值并按原样运行数据框时,也会出现NA错误:
library(tidyverse)
data_2018 <- read_tsv('teina230.tsv')
data_1995 <- read_csv('OECD_1995.csv')
#get rid of long colname & select only columns containing %GDP
clean_data_2018 <- data_2018 %>%
select('na_item,sector,unit,geo','2018Q1','2018Q2','2018Q3','2018Q4') %>%
rename(country = 'na_item,sector,unit,geo')
clean_data_2018 <- clean_data_2018[grep("PC_GDP", clean_data_2018$'country'), ]
#remove unnecessary info
clean_data_2018 <- clean_data_2018 %>%
mutate(country=gsub('\\GD,S13,PC_GDP,','',country))
clean_data_2018 <- clean_data_2018 %>%
mutate(
'2018Q1'=as.numeric(sub("", "", '2018Q1', fixed = TRUE)),
'2018Q2'=as.numeric(sub(" ", "", '2018Q2', fixed = TRUE)),
'2018Q3'=as.numeric(sub(" ", "", '2018Q3', fixed = TRUE)),
'2018Q4'=as.numeric(sub(" ", "", '2018Q4', fixed = TRUE))
)
还有另一种方法可以解决该问题并转换列而不用'NA'代替所有值吗?
谢谢大家:)
答案 0 :(得分:0)
感谢提示@divibisan!
通过rename()重命名列实际上解决了这个问题。这里的代码终于起作用了:
library(tidyverse)
data_2018 <- read_tsv('teina230.tsv')
#get rid of long colname & select only columns containing %GDP
clean_data_2018 <- data_2018 %>%
select('na_item,sector,unit,geo','2018Q1','2018Q2','2018Q3','2018Q4') %>%
rename(country = 'na_item,sector,unit,geo',
quarter_1 = '2018Q1',
quarter_2 = '2018Q2',
quarter_3 = '2018Q3',
quarter_4 = '2018Q4')
clean_data_2018 <- clean_data_2018[grep("PC_GDP", clean_data_2018$'country'), ]
#remove unnecessary info
clean_data_2018 <- clean_data_2018 %>%
mutate(country=gsub('\\GD,S13,PC_GDP,','',country))
clean_data_2018 <- clean_data_2018 %>%
mutate(
quarter_1 = as.numeric(quarter_1),
quarter_2 = as.numeric(quarter_2),
quarter_3 = as.numeric(quarter_3),
quarter_4 = as.numeric(quarter_4)
)