我正在尝试将以下csv文件读入R
我遇到的问题是Property, plant and equipment
,Gross property, pland and equipment
和Net property, plant and equipment
的名称都带有逗号。
怎么可能忽略这三行或忽略文本字段中的逗号?
这是使用text to columns
函数在excel中数据的显示方式。
列已被拆分为两行,数据转移到了2009年,应该是2010年。
编辑:这是记事本数据。
FACEBOOK INC A (FB) CashFlowFlag BALANCE SHEET
Fiscal year ends in December. USD in millions except per share data.,2017-12,2016-12,2015-12,2014-12,2013-12,2012-12,2011-12,2010-12,2009-12,2008-12
Assets
Current assets
Cash
Cash and cash equivalents,8079,8903,4907,4315,3323,2384,1512,1785,,
Short-term investments,33632,20546,13527,6884,8126,7242,2396,,,
Total cash,41711,29449,18434,11199,11449,9626,3908,1785,,
Receivables,5832,3993,2559,1678,1109,719,547,373,,
Prepaid expenses,1020,959,659,793,461,471,149,88,,
Other current assets,,,,,51,451,,,,
Total current assets,48563,34401,21652,13670,13070,11267,4604,2246,,
Non-current assets
"Property, plant and equipment"
"Gross property, plant and equipment",18337,11803,7819,5784,4142,3273,1925,820,,
Accumulated Depreciation,-4616,-3212,-2132,-1817,-1260,-882,-450,-246,,
"Net property, plant and equipment",13721,8591,5687,3967,2882,2391,1475,574,,
Goodwill,18221,18122,18026,17981,839,587,82,37,,
Intangible assets,1884,2535,3246,3929,883,801,80,59,,
Other long-term assets,2135,1312,796,637,221,57,90,74,,
Total non-current assets,35961,30560,27755,26514,4825,3836,1727,744,,
Total assets,84524,64961,49407,40184,17895,15103,6331,2990,,
Liabilities and stockholders' equity
Liabilities
Current liabilities
Short-term debt,,,201,,,,,,,
Capital leases,,,7,114,239,365,279,106,,
Accounts payable,380,302,196,176,87,65,63,29,,
Taxes payable,230,,,,,,,,,
Accrued liabilities,685,331,192,164,87,,296,137,,
Deferred revenues,98,90,56,66,38,30,90,42,,
Other current liabilities,2367,2152,1273,904,649,592,171,75,,
Total current liabilities,3760,2875,1925,1424,1100,1052,899,389,,
Non-current liabilities
Long-term debt,,,,,,1500,,250,,
Capital leases,,,107,119,237,491,398,117,,
Deferred taxes liabilities,,,163,987,,,,,,
Other long-term liabilities,6417,2892,2994,1558,1088,305,135,72,,
Total non-current liabilities,6417,2892,3264,2664,1325,2296,533,439,,
Total liabilities,10177,5767,5189,4088,2425,3348,1432,828,,
Stockholders' equity
Preferred stock,,,,,,,615,615,,
Additional paid-in capital,40584,38227,34886,30225,12297,10094,2684,947,,
Retained earnings,33990,21670,9787,6099,3159,1659,1606,606,,
Accumulated other comprehensive income,-227,-703,-455,-228,14,2,-6,-6,,
Total stockholders' equity,74347,59194,44218,36096,15470,11755,4899,2162,,
Total liabilities and stockholders' equity,84524,64961,49407,40184,17895,15103,6331,2990,,
答案 0 :(得分:2)
所有带有逗号的字符串都用"
正确引用,因此使用read.csv
读取它们没有困难。假设文件存储为~/Downloads/fb_balance_sheet.csv
,将得到:
data <- read.csv("~/Downloads/fb_balance_sheet.csv", header = TRUE, skip = 1, stringsAsFactors = FALSE)
dim(data)
#> [1] 45 11
data[ ,1]
#> [1] "Assets"
#> [2] "Current assets"
#> [3] "Cash"
#> [4] "Cash and cash equivalents"
#> [5] "Short-term investments"
#> [6] "Total cash"
#> [7] "Receivables"
#> [8] "Prepaid expenses"
#> [9] "Other current assets"
#> [10] "Total current assets"
#> [11] "Non-current assets"
#> [12] "Property, plant and equipment"
#> [13] "Gross property, plant and equipment"
#> [14] "Accumulated Depreciation"
#> [15] "Net property, plant and equipment"
#> [16] "Goodwill"
#> [17] "Intangible assets"
#> [18] "Other long-term assets"
#> [19] "Total non-current assets"
#> [20] "Total assets"
#> [21] "Liabilities and stockholders' equity"
#> [22] "Liabilities"
#> [23] "Current liabilities"
#> [24] "Short-term debt"
#> [25] "Capital leases"
#> [26] "Accounts payable"
#> [27] "Taxes payable"
#> [28] "Accrued liabilities"
#> [29] "Deferred revenues"
#> [30] "Other current liabilities"
#> [31] "Total current liabilities"
#> [32] "Non-current liabilities"
#> [33] "Long-term debt"
#> [34] "Capital leases"
#> [35] "Deferred taxes liabilities"
#> [36] "Other long-term liabilities"
#> [37] "Total non-current liabilities"
#> [38] "Total liabilities"
#> [39] "Stockholders' equity"
#> [40] "Preferred stock"
#> [41] "Additional paid-in capital"
#> [42] "Retained earnings"
#> [43] "Accumulated other comprehensive income"
#> [44] "Total stockholders' equity"
#> [45] "Total liabilities and stockholders' equity"
由reprex package(v0.2.0)于2018-09-17创建。
答案 1 :(得分:1)
尽管有些名称带有逗号,但它们用引号引起来,因此它们将被视为一个块。只需读取没有标题的文件并跳过第一行:
fbBalance <- read.csv('fb_balance_sheet.csv', header = FALSE, skip = 1)
> fbBalance[12:16,]
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11
12 Non-current assets
13 Property, plant and equipment
14 Gross property, plant and equipment 18337 11803 7819 5784 4142 3273 1925 820
15 Accumulated Depreciation -4616 -3212 -2132 -1817 -1260 -882 -450 -246
16 Net property, plant and equipment 13721 8591 5687 3967 2882 2391 1475 574
然后您可以调整数据。我建议先转置,以便将变量作为列:
fbT <- as.data.frame(t(fbBalance[2:ncol(fbBalance)]))
names(fbT) <- fbBalance[,1]
names(fbT)[1] <- 'Fiscal year'
fbT[,1] <- gsub('-12$', '', fbT[,1]) # keep only the year
fbT <- apply(fbT, 2, as.numeric) # convert all columns to numeric
fbT <- fbT[, colSums(is.na(fbT)) != nrow(fbT)] # remove columns that have only NAs
> fbT[,1:4]
Fiscal year Cash and cash equivalents Short-term investments Total cash
[1,] 2017 8079 33632 41711
[2,] 2016 8903 20546 29449
[3,] 2015 4907 13527 18434
[4,] 2014 4315 6884 11199
[5,] 2013 3323 8126 11449
[6,] 2012 2384 7242 9626
[7,] 2011 1512 2396 3908
[8,] 2010 1785 NA 1785
[9,] 2009 NA NA NA
[10,] 2008 NA NA NA