如何在Spark中处理多行文本

时间:2016-06-04 17:25:29

标签: python scala

假设我有一个包含单词和解释的字典,格式如下:

Firefox

    <web> A complete {free}, {open-source} {web
    browser} from the {Mozilla Foundation} and therefore a true
    code descendent of {Netscape Navigator}.  The first non-{beta
    release} was in late 2004.

    {Firefox Home (http://mozilla.org/products/firefox)}.

    (2005-01-26)

firehose syndrome

    <networking, jargon> An absence, failure or inadequacy of flow
    control mechanisms causing the sender to overwhelm the
    receiver.  The implication is that, like trying to drink from
    a firehose, the consequenses are worse than just loss of data,
    e.g. the receiver may {crash}.

    See {ping-flood}.

    [{Jargon File}]

    (2007-03-12)

firewall

    1. {firewall code}.

    2. {firewall machine}.

firewall code

    1. The code you put in a system (say, a telephone switch) to
    make sure that the users can't do any damage. Since users
    always want to be able to do everything but never want to
    suffer for any mistakes, the construction of a firewall is a
    question not only of defensive coding but also of interface
    presentation, so that users don't even get curious about those
    corners of a system where they can burn themselves.

    2. Any sanity check inserted to catch a {can't happen} error.
    Wise programmers often change code to fix a bug twice: once to
    fix the bug, and once to insert a firewall which would have
    arrested the bug before it did quite as much damage.

    [{Jargon File}]

我如何映射这种类型的数据,以便我可以在spark中使用它做什么?

0 个答案:

没有答案