如何在R中打开之前检查文件大小

时间:2015-06-01 18:38:44

标签: r download unzip filesize

如何在将文件加载到R?

之前检查文件的大小

例如:

http://math.ucdenver.edu/RTutorial/titanic.txt

我想使用最佳命令根据文件的大小打开文件。

4 个答案:

答案 0 :(得分:21)

使用file.info()

file.info("data/ullyses.txt")

                    size isdir mode               mtime               ctime               atime  uid  gid
data/ullyses.txt 1573151 FALSE  664 2015-06-01 15:25:55 2015-06-01 15:25:55 2015-06-01 15:25:55 1008 1008

然后提取名为size的列:

file.info("data/ullyses.txt")$size
[1] 1573151

答案 1 :(得分:8)

--Create temp tables for data sample 
--table A - unique ID
DECLARE @tableA AS TABLE
    (
      id INT ,
      [DESC] VARCHAR(20) ,
      code BIGINT
    )
INSERT  INTO @tableA
        ( id, [DESC], code )
VALUES  ( 1, 'ballpen', 1010 ),
        ( 2, 'pencil', 1010 ),
        ( 3, 'stabilo', 1010 ),
        ( 4, 'pins', 1011 ),
        ( 5, 'clips', 1011 )
--table B not unique code
DECLARE @tableB AS TABLE ( id INT, code VARCHAR(10) )
INSERT  INTO @tableB
        ( id, code )
VALUES  ( 1010, 'AAA' ),
        ( 1011, 'BBB' ),
        ( 1013, 'CCC' ),
        ( 1010, 'AAA' ),
        ( 1011, 'BBB' ),
        ( 1013, 'CCC' ),
        ( 1010, 'AAA' ),
        ( 1011, 'BBB' ),
        ( 1013, 'CCC' )
------------------------------------------------------------------------------------------
--1 variant of the final query
SELECT  DISTINCT 
        a.id ,
        a.[DESC] ,
        a.code ,
        b.code AS [code 2]
FROM    @tableA AS a
        LEFT JOIN @tableB AS b ON a.code = b.id
--2 variant of the final query
SELECT  *
FROM    ( SELECT  DISTINCT
                    a.id ,
                    a.[DESC] ,
                    a.code ,
                    b.code AS [code 2] ,
                    ROW_NUMBER() OVER ( PARTITION BY a.id ORDER BY a.id ) AS RN
          FROM      @tableA AS a
                    LEFT JOIN @tableB AS b ON a.code = b.id
        ) AS t
WHERE   rn = 1
--3 variant of the final query
;WITH cte AS
   ( SELECT  DISTINCT
                    a.id ,
                    a.[DESC] ,
                    a.code ,
                    b.code AS [code 2] ,
                    ROW_NUMBER() OVER ( PARTITION BY a.id ORDER BY a.id ) AS RN
          FROM      @tableA AS a
                    LEFT JOIN @tableB AS b ON a.code = b.id
        )
SELECT *
FROM cte
WHERE   rn = 1
--4 variant of the final query
SELECT TOP 1 WITH TIES
        a.id ,
        a.[DESC] ,
        a.code ,
        b.code AS [code 2]
FROM    @tableA AS a
        LEFT JOIN @tableB AS b ON a.code = b.id
ORDER BY ROW_NUMBER() OVER ( PARTITION BY a.id ORDER BY a.id )

答案 2 :(得分:1)

如果您在知道文件大小之前不想下载文件,可以尝试以下方法:

注意:这只适用于Mac或Linux。

file_url = 'http://math.ucdenver.edu/RTutorial/titanic.txt'
curl_cmd = paste('curl -X HEAD -i', file_url)
system_cmd = paste(curl_cmd, '|grep Content-Length |cut -d : -f 2')

上面将使用system()将要执行的字符串打包在一起。 curl_cmd字符串告诉curl只获取文件的标题。

system_cmd字符串包含一些额外的命令,用于解析标题并仅提取文件大小。

现在,调用system()并使用intern = TRUE参数告诉R保持输出。

b <- system(system_cmd, intern = TRUE)
##  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current 
##                              Dload  Upload   Total   Spent    Left  Speed
##   0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:-- 0   
## curl: (18) transfer closed

它将仅下载文件的标头并解析它以获取文件大小。现在b将是以字节为单位的文件大小。

然后你可以决定如何打开文件,或打印友好的东西,如:

print(paste("There are", as.numeric(b)/1e6, "mb in the file:", file_url))
## [1] "There are 0.055692 mb in the file: http://math.ucdenver.edu/RTutorial/titanic.txt"

答案 3 :(得分:1)

也许自讨论以来已添加了它,但至少对于R3.4 +,答案是file.size