我正在使用最初在excel中格式化的csv文件。我想将费率列转换为数字并删除“$”符号。
我在文件中读到:
> NImp <- read.csv("National_TV_Spots 6_30_14 to 8_31_14.csv",
sep=",", header=TRUE, stringsAsFactors=FALSE,
strip.white=TRUE, na.strings=c("Not Monitored"))
数据框如下所示:
HH.IMP..000. ISCI Creative Program Rate
1 NA IT3896 Rising Costs30 (Opportunity Scholar - No Nursing) NUVO CINEMA $0.00
2 NA IT3896 Rising Costs30 (Opportunity Scholar - No Nursing) NUVO CINEMA $0.00
3 141 IT14429 Rising Costs30 (Opportunity Scholar - No Nursing) BONUS $0.00
4 476 ITES15443H Matthew Traina (B. EECT/A. CEET) :60 (no loc) Law & Order: SVU $0.00
5 NA IT3896 Rising Costs30 (Opportunity Scholar - No Nursing) NUVO CINEMA $0.00
当我进行转换时,收到一条错误消息:> NImp$Rate <- as.numeric(gsub("$","", NImp$Rate))
Warning message:
NAs introduced by coercion
并且所有值都被强制转换为NA。
我也尝试了,NImp$Rate <- as.numeric(sub("\\$","", NImp$Rate))
但又得到了相同的警告信息。然而,并非所有的值都成为NAs - 只有特定的值。我打开excel中的csv进行检查,我意识到excel强制csv列宽太窄导致“####”单元格。这些细胞被r
强制为“NA”。
我尝试了在记事本中打开文件的选项,并将记事本文件读入r
。但我得到了相同的结果。这些值在记事本和我将文件读入r
时正确显示。但是当我更改为数字时,excel中显示为“####”的所有内容都变为NA
。
我该怎么办?
添加str(NImp)
'data.frame': 9859 obs. of 19 variables:
$ Spot.ID : int 13072903 13072904 13072898 13072793 13072905 13072899 13072397 13072476 13072398 13072681 ...
$ Date : chr "6/30/2014" "6/30/2014" "6/30/2014" "6/30/2014" ...
$ Hour : int 0 0 0 0 0 0 1 1 1 2 ...
$ Time : chr "12:08 AM" "12:20 AM" "12:29 AM" "12:30 AM" ...
$ Local.Date : chr "6/30/2014" "6/30/2014" "6/30/2014" "6/30/2014" ...
$ Broadcast.Week : int 1 1 1 1 1 1 1 1 1 1 ...
$ Local.Hour : int 0 0 0 0 0 0 1 1 1 2 ...
$ Local.Time : chr "12:08 AM" "12:20 AM" "12:29 AM" "12:30 AM" ...
$ Market : chr "NATIONAL CABLE" "NATIONAL CABLE" "NATIONAL CABLE" "NATIONAL CABLE" ...
$ Vendor : chr "NUVO" "NUVO" "AFAM" "USA" ...
$ Station : chr "NUVO" "NUVO" "AFAM" "USA" ...
$ M18.34.IMP..000.: int NA NA 3 88 NA 3 NA 53 NA 37 ...
$ W18.34.IMP..000.: int NA NA 86 66 NA 86 NA 70 NA 60 ...
$ A18.34.IMP..000.: int NA NA 89 154 NA 89 NA 123 NA 97 ...
$ HH.IMP..000. : int NA NA 141 476 NA 141 NA 461 NA 434 ...
$ ISCI : chr "IT3896" "IT3896" "IT14429" "ITES15443H" ...
$ Creative : chr "Rising Costs30 (Opportunity Scholar - No Nursing)" "Rising Costs30 (Opportunity Scholar - No Nursing)" "Rising Costs30 (Opportunity Scholar - No Nursing)" "Matthew Traina (B. EECT/A. CEET) :60 (no loc)" ...
$ Program : chr "NUVO CINEMA" "NUVO CINEMA" "BONUS" "Law & Order: SVU" ...
$ Rate : chr "$0.00" "$0.00" "$0.00" "$0.00" ...
答案 0 :(得分:1)
在Excel中将列设置为“货币”时,数千或更大的值中包含逗号以及美元符号前缀。例如,值可能看起来像$1,200.00
。您遇到的问题是因为您删除了美元符号而不是逗号,所以当您尝试转换为numeric
时,您会获得NA
。
as.numeric(c("0", "0", "1,200"))
[1] 0 0 NA
Warning message:
NAs introduced by coercion
您可以使用gsub
一步删除美元符号和逗号。我在this answer的评论中找到了如何执行此操作的示例。
as.numeric(gsub("[$,]", "", c("$0", "$0", "$1,200")))
[1] 0 0 1200
因此应该适用于您的数据集的代码是
as.numeric(gsub("[$,]", "", NImp$Rate))