如何从FTP服务器上的文件中提取数据而不在R中全部下载? - 编码错误?

时间:2016-01-21 19:36:04

标签: r ftp rcurl

我正在尝试从以下服务器获取大型数据集(3+ GB):

ftp://podaac-ftp.jpl.nasa.gov/allData/ghrsst/data/L4/GLOB/JPL/MUR

我知道RCurl是从FTP获取数据的好方法。该文件是压缩的netcdf文件。我需要解压缩它以使用ncdf4将其读入R中。它被压缩为bz2

重要的是,该文件比我在硬盘上的要大,因此在本地保存副本不是理想的选择。如何在不先将副本保存到磁盘的情况下访问文件中的数据?

这是我到目前为止的尝试:

library(RCurl); library(ncdf4)
d = getURL('ftp://podaac-ftp.jpl.nasa.gov/allData/ghrsst/data/L4/GLOB/JPL/MUR/2015/144/20150524-JPL-L4UHfnd-GLOB-v01-fv04-MUR.nc.bz2')
d = bzfile(d, open = 'r')
d = nc_open(d)

但在第一行之后我仍然坚持这个神秘的错误:

Error in curlPerform(curl = curl, .opts = opts, .encoding = .encoding) : 
  embedded nul in string: 'BZh91AY&SY¦ÁÀÉ\0033[ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿáåÏ\035\017)³îÎ\u009dÍØcn]sw7½ÎkÜÞõï=uÎׯv]ìçn\u009dÎn½îê·±Þìê÷wS­M\u008có·+ÎçW¹Ý=Ù×¹\u009cγ­ÜëÞs½ÛN¹²w;\u009buÍÝ]{·k^çuªnìº-³6«[+Üå;\033m»Û½ow:w¹ïo{uyîî\u00937¬\\Ƶl¶½\u009dÖVìç¯{ÎõïoSm]Ý×\u009eî\u008dæî®î®î\vÛÕïgW\036î®wqîÝ\\ïw«6½Þï\036Ýrë§=¬Fg·\\íåÔÙº÷gu·3\u009bKmÛ\027­Þ»\u0092îî\016îêwwm»\u009b­·s;MÞÁ½½­ÎóÍso^»q¯o;k\033iµ\u009bÛuyÝÞní5w:ï]ÓuÎo[«\033:åÞvEÜíÎç½ÝË­\u009eìQNöÔ\u008e\u0094vmÝȯg»e lÍ^\u008a©'

这似乎是基于其他类似问题的编码问题,但我尝试了.encoding = 'UTF-8'.encoding = 'ISO-8859-1',如getURL()文档中所示,但都不起作用。 我已经看到了这样的问题的其他答案,但它们似乎都涉及编辑源文件。但是,我没有对此文件的写入权限。有什么帮助吗?

2 个答案:

答案 0 :(得分:1)

我将Callable用于此

package com.ggl.testing;

import java.util.ArrayList;
import java.util.Arrays;
import java.util.HashMap;
import java.util.Iterator;
import java.util.List;
import java.util.Map;
import java.util.Map.Entry;
import java.util.Set;

public class MultiMap<K, V> {

    private static long sequence = 0;

    private Map<K, Long> key1Map;
    private Map<K, Long> key2Map;
    private Map<Long, List<V>> valueMap;

    public MultiMap() {
        this.key1Map = new HashMap<>();
        this.key2Map = new HashMap<>();
        this.valueMap = new HashMap<>();
    }

    public void addElement(K key1, K key2, V value) {
        boolean key1boolean = key1Map.containsKey(key1);
        boolean key2boolean = key2Map.containsKey(key2);
        boolean key3boolean = key1Map.containsKey(key2);
        boolean key4boolean = key2Map.containsKey(key1);

        if (key1boolean && key2boolean) {
            Long key1Value = key1Map.get(key1);
            Long key2Value = key2Map.get(key2);
            updateValue(key1, key2, key1Value, key2Value, value);
        } else if (key3boolean && key4boolean) {
            Long key1Value = key1Map.get(key2);
            Long key2Value = key2Map.get(key1);
            updateValue(key1, key2, key1Value, key2Value, value);
        } else if (key1boolean || key4boolean) {
            String s = displayDuplicateError(key1);
            throw new IllegalStateException(s);
        } else if (key2boolean || key3boolean) {
            String s = displayDuplicateError(key2);
            throw new IllegalStateException(s);
        } else {
            createValue(key1, key2, value);
        }

    }

    private void createValue(K key1, K key2, V value) {
        Long newKeyValue = sequence++;
        key1Map.put(key1, newKeyValue);
        key2Map.put(key2, newKeyValue);

        List<V> values = new ArrayList<>();
        values.add(value);
        valueMap.put(newKeyValue, values);
    }

    private void updateValue(K key1, K key2, Long key1Value, Long key2Value,
            V value) {
        if (key1Value.equals(key2Value)) {
            List<V> values = valueMap.get(key1Value);
            values.add(value);
            valueMap.put(key1Value, values);
        } else {
            String s = displayMismatchError(key1, key2);
            throw new IllegalStateException(s);
        }
    }

    private String displayMismatchError(K key1, K key2) {
        return "Keys " + key1.toString() + " & " + key2.toString()
                + " have a different internal key.";
    }

    private String displayDuplicateError(K key) {
        return "Key " + key.toString() + " is part of another key pair";
    }

    public List<V> getElement(K key) {
        if (key1Map.containsKey(key)) {
            return valueMap.get(key1Map.get(key));
        }

        if (key2Map.containsKey(key)) {
            return valueMap.get(key2Map.get(key));
        }

        return null;
    }

    public boolean removeElement(K key) {
        if (key1Map.containsKey(key)) {
            Long key1Value = key1Map.get(key);
            Set<Entry<K, Long>> entrySet = key2Map.entrySet();
            K key2 = getOtherKey(key1Value, entrySet);

            valueMap.remove(key1Value);
            key1Map.remove(key);
            key2Map.remove(key2);

            return true;
        } else if (key2Map.containsKey(key)) {
            Long key2Value = key2Map.get(key);
            Set<Entry<K, Long>> entrySet = key1Map.entrySet();
            K key1 = getOtherKey(key2Value, entrySet);

            valueMap.remove(key2Value);
            key1Map.remove(key1);
            key2Map.remove(key);

            return true;
        }

        return false;
    }

    private K getOtherKey(Long key1Value, Set<Entry<K, Long>> entrySet) {
        Iterator<Entry<K, Long>> iter = entrySet.iterator();
        K key = null;
        while (iter.hasNext() && key == null) {
            Entry<K, Long> entry = iter.next();
            if (entry.getValue().equals(key1Value)) {
                key = entry.getKey();
            }
        }
        return key;
    }

    public static void main(String[] args) {
        MultiMap<String, String> multiMap = new MultiMap<>();

        try {
            multiMap.addElement("one", "two", "numbers");
            multiMap.addElement("alpha", "beta", "greek alphabet");
            multiMap.addElement("beta", "alpha", "alphabet");
            multiMap.addElement("iron", "oxygen", "elements");
        } catch (Exception e) {
            e.printStackTrace();
        }

        System.out.println(Arrays.toString(multiMap.getElement("iron")
                .toArray()));
        System.out.println(Arrays.toString(multiMap.getElement("beta")
                .toArray()));

        System.out.println(multiMap.removeElement("two"));
    }

}

我没有以编程方式排序的唯一步骤是解压缩bz2文件,只是使用OSX的默认工具

答案 1 :(得分:0)

我对R一无所知,但你应该能够通过将输出更改为stdout而不是本地文件名然后使用{{1从标准输入中解压缩你想要的文件。

所以,例如,我可以这样做:

bz2

也许你可以在curl --output - --user user:password 'ftp://127.0.0.1/somefile.bz2' | bz2 ... 内开始吗?或者使用:

制作一个fifo
R

然后从mkfifo fifo curl .... 中的fifo文件中读取。

或者R可能有R命令,你可以这样做:

system()

然后从system('mkfifo fifo; curl ..... | bz2 .... > fifo &') 中的fifo文件中读取。