Julia可以从流而不是从文件中读取

时间:2017-05-19 07:37:27

标签: dataframe stream julia

有没有办法从网络网址或runpipe外部命令读取表格?看来DataFrame.readtable只支持从文件中读取。

例如在R中,我们可以这样做:

df = read.table(url("http://example.com/data.txt"))

x = read.table(pipe("zcat data.txt | sed /^#/d  | cut  -f '11-13'"), colClasses=c("integer","integer","integer"), fill=TRUE, row.names=NULL)

2 个答案:

答案 0 :(得分:7)

using DataFrames, Requests

julia> resp = get("https://data.cityofnewyork.us/api/views/kku6-nxdu/rows.csv?accessType=DOWNLOAD")
Response(200 OK, 17 headers, 27350 bytes in body)

julia> tbl = readtable(IOBuffer(resp.data));

julia> names(tbl)
46-element Array{Symbol,1}:
 :JURISDICTION_NAME                  
 :COUNT_PARTICIPANTS                 
 :COUNT_FEMALE                       
 :PERCENT_FEMALE                     
 :COUNT_MALE                         
 :PERCENT_MALE                       
 :COUNT_GENDER_UNKNOWN               
 :PERCENT_GENDER_UNKNOWN             
 :COUNT_GENDER_TOTAL                 
 :PERCENT_GENDER_TOTAL               
 :COUNT_PACIFIC_ISLANDER             
 :PERCENT_PACIFIC_ISLANDER           
 :COUNT_HISPANIC_LATINO              
 :PERCENT_HISPANIC_LATINO            
 :COUNT_AMERICAN_INDIAN              
 :PERCENT_AMERICAN_INDIAN            
 :COUNT_ASIAN_NON_HISPANIC           
 ⋮                                   
 :PERCENT_PERMANENT_RESIDENT_ALIEN   
 :COUNT_US_CITIZEN                   
 :PERCENT_US_CITIZEN                 
 :COUNT_OTHER_CITIZEN_STATUS         
 :PERCENT_OTHER_CITIZEN_STATUS       
 :COUNT_CITIZEN_STATUS_UNKNOWN       
 :PERCENT_CITIZEN_STATUS_UNKNOWN     
 :COUNT_CITIZEN_STATUS_TOTAL         
 :PERCENT_CITIZEN_STATUS_TOTAL       
 :COUNT_RECEIVES_PUBLIC_ASSISTANCE   
 :PERCENT_RECEIVES_PUBLIC_ASSISTANCE 
 :COUNT_NRECEIVES_PUBLIC_ASSISTANCE  
 :PERCENT_NRECEIVES_PUBLIC_ASSISTANCE
 :COUNT_PUBLIC_ASSISTANCE_UNKNOWN    
 :PERCENT_PUBLIC_ASSISTANCE_UNKNOWN  
 :COUNT_PUBLIC_ASSISTANCE_TOTAL      
 :PERCENT_PUBLIC_ASSISTANCE_TOTAL

julia> eltypes(tbl)
46-element Array{Type,1}:
 Int64  
 Int64  
 Int64  
 Float64
 Int64  
 Float64
 Int64  
 Int64  
 Int64  
 Int64  
 Int64  
 Float64
 Int64  
 Float64
 Int64  
 Float64
 Int64  
 ⋮      
 Float64
 Int64  
 Float64
 Int64  
 Float64
 Int64  
 Int64  
 Int64  
 Int64  
 Int64  
 Float64
 Int64  
 Float64
 Int64  
 Int64  
 Int64  
 Int64 

答案 1 :(得分:0)

Requests弃用HTTP后,这里有一个关于如何使用HTTP.requestbody res次调用的示例请求。

julia> using CSV, HTTP

julia> res = HTTP.request("GET", "http://users.csc.calpoly.edu/~dekhtyar/365-Winter2015/data/CARS/cars-data.csv")
HTTP.Messages.Response:
"""
HTTP/1.1 200 OK
Date: Wed, 16 May 2018 12:46:39 GMT
Server: Apache/2.4.18 (Ubuntu)
Last-Modified: Mon, 05 Jan 2015 23:29:09 GMT
ETag: "330f-50bf00ea05b40"
Accept-Ranges: bytes
Content-Length: 13071
Content-Type: text/csv

Id,MPG,Cylinders,Edispl,Horsepower,Weight,Accelerate,Year
1,18,8,307,130,3504,12,1970
2,15,8,350,165,3693,11.5,1970
3,18,8,318,150,3436,11,1970    
⋮
13071-byte body
"""

julia> res_buffer = IOBuffer(res.body)
IOBuffer(data=UInt8[...], readable=true, writable=false, seekable=true,     append=false, size=13071, maxsize=Inf, ptr=1, mark=-1)

julia> using DataFrames, DataStreams

julia> df = CSV.read(res_buffer)
406×8 DataFrames.DataFrame
│ Row │ Id  │ MPG │ Cylinders │ Edispl │ Horsepower │ Weight │ Accelerate │     Year │
        ├─────┼─────┼─────┼───────────┼────────┼────────────┼────────┼────────────┼──────┤
│ 1   │ 1   │ 18  │ 8         │ 307.0  │ 130        │ 3504   │ 12.0       │     1970 │
│ 2   │ 2   │ 15  │ 8         │ 350.0  │ 165        │ 3693   │ 11.5       │     1970 │
│ 3   │ 3   │ 18  │ 8         │ 318.0  │ 150        │ 3436   │ 11.0       │ 1970 │
⋮    
│ 405 │ 405 │ 28  │ 4         │ 120.0  │ 79         │ 2625   │ 18.6       │ 1982 │
│ 406 │ 406 │ 31  │ 4         │ 119.0  │ 82         │ 2720   │ 19.4       │ 1982 │