我已经在R Studio中读了一个相当大的文件,并且我正在尝试对该文件中的特定列求和。
读取文件:
df3 <- read.csv('Musically-ROW-ADS_M_20180801_20180831_merge (2).csv', header = 3, sep = "\t", skip = 2)
我已经知道该栏的总数。
但是,我相信当我对列进行求和时,并不是对整个列求和,只是一部分?
我的结果:24402801 实际值:41412689
> str(df)
'data.frame': 551263 obs. of 22 variables:
$ DSP.Code : chr "9998703" "9998703" "9998703" "9998703" ...
$ Report.Date : int 9212018 9212018 9212018 9212018 9212018 9212018 9212018 9212018 9212018 9212018 ...
$ Initial.Date : int 8012018 8012018 8012018 8012018 8012018 8012018 8012018 8012018 8012018 8012018 ...
$ End.Date : int 8312018 8312018 8312018 8312018 8312018 8312018 8312018 8312018 8312018 8312018 ...
$ Transaction.Type : chr "STREAM" "STREAM" "STREAM" "STREAM" ...
$ Sale.Type : chr "OTHER" "OTHER" "OTHER" "OTHER" ...
$ Distribution.Type : chr "WIRELESS" "WIRELESS" "WIRELESS" "WIRELESS" ...
$ Product.s.Origin.ID : logi NA NA NA NA NA NA ...
$ Product.ID : chr "634041651299" "893583003434" "ABCD13027823" "ABCD13027825" ...
$ Artist : chr "Icekid" "Anna of the North" "Silk Rabbits" "Silk Rabbits" ...
$ Title : chr "Roll it Ft Jfly" "Lovers" "Hurt" "Careless Whisper" ...
$ Units.Sold : num 1 1 1 1 2 1 1 1 1 1 ...
$ Retailer.Price : int 0 0 0 0 0 0 0 0 0 0 ...
$ Dealer.Price : int 0 0 0 0 0 0 0 0 0 0 ...
$ Additional.Revenue : int 0 0 0 0 0 0 0 0 0 0 ...
$ Warner.Share : int 0 0 0 0 0 0 0 0 0 0 ...
$ Entity.to.be.Billed : chr "9998703" "9998703" "9998703" "9998703" ...
$ E.retailer.name : chr "MUSICAL.LY" "MUSICAL.LY" "MUSICAL.LY" "MUSICAL.LY" ...
$ E.retailer.country : chr "US" "US" "US" "US" ...
$ End.Consumer.Country: chr "DK" "CA" "ID" "MY" ...
$ Price.Code : chr "STD" "STD" "STD" "STD" ...
$ Currency : chr "USD" "USD" "USD" "USD" ...
这是因为文件很大以计算整个列吗?我是否需要增加R允许的大小才能读取整个文件?
如果文件大小有问题,我尝试了以下方法:
memory.limit()
memory.size()
这是用于汇总列的代码;
sum(df$Units.Sold, na.rm = T)
[1] 24402801
已解决:
> df <- read.csv('Musically-ROW-ADS_M_20180801_20180831_merge (2).csv', header = TRUE, sep = "\t", skip = 2, quote = "")