如何在CSV文件中读取带有“,”千位内容的整数

时间:2015-09-28 07:32:58

标签: r csv

我有一个像这样的CSV文件:

Year,All,Northeast,Midwest,South,West,     CPI 
1987,"85,600","133,300","66,000","80,400","113,200",113.6
1988,"89,300","143,000","68,400","82,200","124,900",118.3
1989,"89,500","127,700","71,800","84,400","127,100",124
1990,"92,000","126,400","75,300","85,100","129,600",130.7
1991,"97,100","129,100","79,500","88,500","135,300",136.2
1992,"99,700","128,900","83,000","91,500","131,500",140.3
1993,"103,100","129,100","86,000","94,300","132,500",144.5

代码是这样的:

> fn <- paste(data.path, p2, "tmp.csv", sep="//")
> d <- read.csv(fn)
> str(d)
'data.frame':   7 obs. of  7 variables:
 $ Year     : int  1987 1988 1989 1990 1991 1992 1993
 $ All      : Factor w/ 7 levels "103,100","85,600",..: 2 3 4 5 6 7 1
 $ Northeast: Factor w/ 6 levels "126,400","127,700",..: 5 6 2 1 4 3 4
 $ Midwest  : Factor w/ 7 levels "66,000","68,400",..: 1 2 3 4 5 6 7
 $ South    : Factor w/ 7 levels "80,400","82,200",..: 1 2 3 4 5 6 7
 $ West     : Factor w/ 7 levels "113,200","124,900",..: 1 2 3 4 7 5 6
 $ CPI      : num  114 118 124 131 136 ...
> d
  Year     All Northeast Midwest  South    West   CPI
1 1987  85,600   133,300  66,000 80,400 113,200 113.6
2 1988  89,300   143,000  68,400 82,200 124,900 118.3
3 1989  89,500   127,700  71,800 84,400 127,100 124.0
4 1990  92,000   126,400  75,300 85,100 129,600 130.7
5 1991  97,100   129,100  79,500 88,500 135,300 136.2
6 1992  99,700   128,900  83,000 91,500 131,500 140.3
7 1993 103,100   129,100  86,000 94,300 132,500 144.5

当我使用read.csv函数时,它将“All,Northeast,Midwest,South,West”作为字符串。如何以简单的方式纠正这个问题?

顺便说一句: 此CSV文件由Excel生成。我发现因为Excel在CSV文件中使用逗号作为seprator,如果数字中应使用逗号作为千位sep,它将为该数字添加引号。 Excel可以很好地使用这种格式。但它为R添加了一些好处。

谢谢。

1 个答案:

答案 0 :(得分:2)

DF <- read.csv(text = 'Year,All,Northeast,Midwest,South,West,     CPI 
1987,"85,600","133,300","66,000","80,400","113,200",113.6
1988,"89,300","143,000","68,400","82,200","124,900",118.3
1989,"89,500","127,700","71,800","84,400","127,100",124
1990,"92,000","126,400","75,300","85,100","129,600",130.7
1991,"97,100","129,100","79,500","88,500","135,300",136.2
1992,"99,700","128,900","83,000","91,500","131,500",140.3
1993,"103,100","129,100","86,000","94,300","132,500",144.5')

#remove "," and convert
DF[, 2:6] <- lapply(DF[, 2:6], function(x) type.convert(gsub(",", "", x, fixed = TRUE)))

str(DF)
# 'data.frame':  7 obs. of  7 variables:
# $ Year     : int  1987 1988 1989 1990 1991 1992 1993
# $ All      : int  85600 89300 89500 92000 97100 99700 103100
# $ Northeast: int  133300 143000 127700 126400 129100 128900 129100
# $ Midwest  : int  66000 68400 71800 75300 79500 83000 86000
# $ South    : int  80400 82200 84400 85100 88500 91500 94300
# $ West     : int  113200 124900 127100 129600 135300 131500 132500
# $ CPI      : num  114 118 124 131 136 ...