总和不正确

时间:2019-07-10 11:36:07

标签: r

我已经在R Studio中读了一个相当大的文件,并且我正在尝试对该文件中的特定列求和。

读取文件:

df3 <- read.csv('Musically-ROW-ADS_M_20180801_20180831_merge (2).csv', header = 3, sep = "\t", skip = 2)

我已经知道该栏的总数。

但是,我相信当我对列进行求和时,并不是对整个列求和,只是一部分?

我的结果:24402801 实际值:41412689

> str(df)
'data.frame':   551263 obs. of  22 variables:
 $ DSP.Code            : chr  "9998703" "9998703" "9998703" "9998703" ...
 $ Report.Date         : int  9212018 9212018 9212018 9212018 9212018 9212018 9212018 9212018 9212018 9212018 ...
 $ Initial.Date        : int  8012018 8012018 8012018 8012018 8012018 8012018 8012018 8012018 8012018 8012018 ...
 $ End.Date            : int  8312018 8312018 8312018 8312018 8312018 8312018 8312018 8312018 8312018 8312018 ...
 $ Transaction.Type    : chr  "STREAM" "STREAM" "STREAM" "STREAM" ...
 $ Sale.Type           : chr  "OTHER" "OTHER" "OTHER" "OTHER" ...
 $ Distribution.Type   : chr  "WIRELESS" "WIRELESS" "WIRELESS" "WIRELESS" ...
 $ Product.s.Origin.ID : logi  NA NA NA NA NA NA ...
 $ Product.ID          : chr  "634041651299" "893583003434" "ABCD13027823" "ABCD13027825" ...
 $ Artist              : chr  "Icekid" "Anna of the North" "Silk Rabbits" "Silk Rabbits" ...
 $ Title               : chr  "Roll it Ft Jfly" "Lovers" "Hurt" "Careless Whisper" ...
 $ Units.Sold          : num  1 1 1 1 2 1 1 1 1 1 ...
 $ Retailer.Price      : int  0 0 0 0 0 0 0 0 0 0 ...
 $ Dealer.Price        : int  0 0 0 0 0 0 0 0 0 0 ...
 $ Additional.Revenue  : int  0 0 0 0 0 0 0 0 0 0 ...
 $ Warner.Share        : int  0 0 0 0 0 0 0 0 0 0 ...
 $ Entity.to.be.Billed : chr  "9998703" "9998703" "9998703" "9998703" ...
 $ E.retailer.name     : chr  "MUSICAL.LY" "MUSICAL.LY" "MUSICAL.LY" "MUSICAL.LY" ...
 $ E.retailer.country  : chr  "US" "US" "US" "US" ...
 $ End.Consumer.Country: chr  "DK" "CA" "ID" "MY" ...
 $ Price.Code          : chr  "STD" "STD" "STD" "STD" ...
 $ Currency            : chr  "USD" "USD" "USD" "USD" ...

这是因为文件很大以计算整个列吗?我是否需要增加R允许的大小才能读取整个文件?

如果文件大小有问题,我尝试了以下方法:

memory.limit()
memory.size()

这是用于汇总列的代码;

sum(df$Units.Sold, na.rm = T)
[1] 24402801

已解决:

> df <- read.csv('Musically-ROW-ADS_M_20180801_20180831_merge (2).csv', header = TRUE, sep = "\t", skip = 2, quote = "")

0 个答案:

没有答案