如何从textFile创建键值对

时间:2016-01-24 10:22:20

标签: scala apache-spark rdd

我想使用RDD元素中的成员作为键,我该怎么做 这是我的数据:

2 1
4 1
1 2
6 3
7 3
7 6
6 7
3 7

我想创建键/值对,使得键是一个元素,值也是下一个元素;

我写了这段代码:

def main(args: Array[String])
 {
   System.setProperty("hadoop.home.dir","C:\\spark-1.5.1-bin-hadoop2.6\\winutil")
   val conf = new SparkConf().setAppName("test").setMaster("local[4]")
   val sc = new SparkContext(conf)

   val lines = sc.textFile("followers.txt")
    .flatMap{x => (x.indexOfSlice(x1),x.indexOfSlice(x2))} 

}

但它不是真的,它不会确定元素的索引; 每两个数字都是一行

1 个答案:

答案 0 :(得分:2)

也许我误解了你的问题,但如果你只是想把你的数据分成键值对,你只需要这样做:

public final class MyNTPUDPClient extends DatagramSocketClient {
    public static final int DEFAULT_PORT = 123;

private int _version = NtpV3Packet.VERSION_3;

public TimeInfo getTime(InetAddress host, int port) throws IOException {
    // if not connected then open to next available UDP port
    if (!isOpen()) {
        open();
    }

    NtpV3Packet message = new NtpV3Impl();
    message.setMode(NtpV3Packet.MODE_CLIENT);
    message.setVersion(_version);
    DatagramPacket sendPacket = message.getDatagramPacket();
    sendPacket.setAddress(host);
    sendPacket.setPort(port);

    NtpV3Packet recMessage = new NtpV3Impl();
    DatagramPacket receivePacket = recMessage.getDatagramPacket();

    TimeStamp now = TimeStamp.getCurrentTime();

    message.setTransmitTime(now);

    _socket_.send(sendPacket);
    _socket_.receive(receivePacket);

    long returnTime = System.currentTimeMillis();
    TimeInfo info = new TimeInfo(recMessage, returnTime, true);

    return info;
}

public TimeInfo getTime(InetAddress host) throws IOException {
    return getTime(host, NtpV3Packet.NTP_PORT);
}

public int getVersion() {
    return _version;
}

public void setVersion(int version) {
    _version = version;
}

}

这会解决您的问题吗?