如何管理生锈中的可选拥有指针?

时间:2013-07-14 04:58:28

标签: rust

我正在将我的一个C ++项目(一个简单的DSL)转换为生锈学习生锈,我遇到嵌套结构和所有权问题。 我很难转换像:

struct FileData {
    bool is_utf8;
    std::string file_name;
};

class Token {
public:
    enum TokenType {
        REGULAR,
        INCLUDE_FILE,

    }

    Token() {
        _type = REGULAR;
    }

    Type get_type() const { return _type; }

    void beginIncludeFile() {
        _type = INCLUDE_FILE;
        _include_data = std::unique_ptr<FileData>(new FileData);
    }

    bool is_utf8() const {
        assert(get_type() == INCLUDE_FILE);
        return _include_data->is_utf8; 
    }

    void set_utf8(bool value) { 
        assert(get_type() == INCLUDE_FILE);
        _include_data->is_utf8 = value; 
    }

    const std::string& get_file_name() const { 
        assert(get_type() == INCLUDE_FILE);
        return _include_data->file_name; 
    }

    void setFileNameToEmpty() {
        assert(get_type() == INCLUDE_FILE);
        _include_data->file_name = "";
    }

    void appendToFileName(char c) { 
        assert(get_type() == INCLUDE_FILE);
        _include_data->file_name += c;
    }

    FileData* releaseFileData() { return _include_data.release(); }
private:
    std::unique_ptr<FileData> _include_data;
    TokenType _type;
};

我为此写的锈迹是:

use std::str;

pub struct FileData {
    is_utf8 : bool,
    file_name : ~str
}

pub fn FileData() -> FileData {
    FileData { is_utf8 : true, file_name : ~"" }
}

enum TokenType {
    REGULAR,
    INCLUDE_FILE
}

pub struct Token {
    priv _include_data : Option<~FileData>,
    priv _type : TokenType
}

pub fn Token() -> Token {
    Token {
        _include_data: None,
        _type : REGULAR
    }
}

impl Token {
    pub fn get_type(&self) -> TokenType {
        self._type
    } 

    pub fn beginIncludeFile(&mut self) {
        self._type = INCLUDE_FILE;
        self._include_data = Some(~FileData());
    }

    pub fn is_utf8(&self) -> bool {
        match self._include_data {
            Some(ref data) => data.is_utf8,
            _ => fail!("No FileData")
        }
    }

    pub fn set_utf8(&mut self, value : bool) {
        self._include_data.mutate(|mut data| {
            data.is_utf8 = value;
            data
        });
    }

    // Return immutable/read-only copy
    pub fn get_file_name(&self) -> &~str {
        match self._include_data {
            Some(ref data) => &data.file_name,
            _ => fail!("No FileData")
        }
    }

    pub fn setFileNameToEmpty(&mut self) {
        match self._include_data {
            Some(ref data) => data.file_name = ~"",
            _ => fail!("No FileData")
        }
        return;
    }

    pub fn appendToFileName(&mut self, c : char) {
        match self._include_data {
            Some(ref data) => data.file_name.push_char(c),
            _ => fail!("No FileData")
        }
        return;
    }

    pub fn getIncludeData(&mut self) -> ~FileData {
        match self._include_data {
            Some(ref data) => *data,
            _ => fail!("No FileData")
        }
    }
}

enum LexState {
    INITIAL,
    EXPECT_COLON,
    EXPECT_ENCODING,
    EXPECT_QUOTE,
    IN_FILENAME_STRING,
    EXPECT_SEMI
}

impl Eq for LexState {
    fn eq(&self, other: &LexState) -> bool {
        return (*self as int) == (*other as int);
    }
    fn ne(&self, other: &LexState) -> bool {
        !self.eq(other)
    }
}

fn main() {
    let mut t = ~Token();
    let input = ~"include:utf8 \"file_path/file.foo\";";
    let iter = input.iter();
    let mut buf : ~str = ~"";

    let mut state : LexState = INITIAL;

    let buf_action = |action : &fn()| {
        buf = ~"";
        action();
    };

    while true {
        let c = iter.next();
        match c {
            None => break,
            Some(_c) => buf.push_char(_c)
        }

        match buf {
            // Initial state
            ~"include" if state == INITIAL => buf_action(|| { 
                t.beginIncludeFile();
                state = EXPECT_COLON;
            }),

            // Expecting either an encoding, or the start of the file name
            ~":" if state == EXPECT_COLON => buf_action(|| { state = EXPECT_ENCODING; }),
            _   if state == EXPECT_COLON => state = EXPECT_QUOTE, // match WS

            // utf8 is the only encoding accepted at the moment
            ~"utf8" if state == EXPECT_ENCODING => buf_action(|| {
                t.set_utf8(true);
                state = EXPECT_QUOTE;
            }),
            _ if state == EXPECT_ENCODING => t.set_utf8(false),

            // Looking for string start
            ~"\"" if state == EXPECT_QUOTE => buf_action(||{ state = IN_FILENAME_STRING; }),
            _ if state == EXPECT_QUOTE => (), // ignore other chars

            // Reading filename
            ~"\"" if state == IN_FILENAME_STRING => buf_action(|| {
                state = EXPECT_SEMI;
            }),
            _ if state == IN_FILENAME_STRING => t.appendToFileName(c.unwrap()),

            // End of lex
            ~":" if state == EXPECT_SEMI => break,
            _   if state == EXPECT_SEMI => fail!("Expected semi"),

            _ => fail!("Unexpected character: " + str::from_char(c.unwrap()))

        }
    }
    return;
}

这种代码的idomatic rust方式是什么?

1 个答案:

答案 0 :(得分:5)

Rust与C ++完全不同,直线逐行转换将提供非惯用代码。这不是一个完整的答案,只是一个点点滴滴的集合:


从结构内部返回信息时,将函数写为fn foo<'a>(&'a self) -> &'a SomeInformation是正常的方法(str[]特别处理):所以

pub fn get_file_name<'a>(&'a self) -> &'a str {
    match self._include_data {
        Some(ref data) => &data.file_name,
        _ => fail!("No FileData")
    }
}

pub fn getIncludeData<'a>(&'a self) -> &'a FileData {
    match self._include_data {
        Some(ref data) => &*data,
        _ => fail!("No FileData")
    }
}

'a标记是named lifetime,它将返回值有效的时间与self对象有效的时间段相关联;这意味着悬空指针是不可能的(忽略编译器错误)。

match的一系列事物:

    检查
  • match的完整性,因此将其翻转(匹配state而非buf)类型更安全。

  • match有一个返回值,因此您可以“神奇地”设置状态。

  • buf_action函数是特殊的(我假设它通常会做更多?),它可以被更改,以便buf_action(foo)被写为clear_buf(); foo,或者,至少应该返回内部闭包的值,所以

    let buf_action = |f| { buf = ~""; f() } // note the lack of semicolon after f
    
  • 调用函数有一个特殊的糖,其中最后一个参数是函数:do buf_action { some; actions(); here; }。 (当闭包有参数时,do f |a,b,c| { x; y; z }。)

    state = match state {
        // Initial state
        INITIAL if "include" == buf => do buf_action { 
            t.beginIncludeFile();
            EXPECT_COLON
        },

        // Expecting either an encoding, or the start of the file name
        EXPECT_COLON => if ":" == buf {
            buf_action(|| EXPECT_ENCODING ),
        } else { 
            EXPECT_QUOTE
        },

        // utf8 is the only encoding accepted at the moment
        EXPECT_ENCODING => match buf {
            ~"utf8" => do buf_action { t.set_utf(true); EXPECT_QUOTE },
            _ => { t.set_utf(false); EXPECT_ENCODING } // this is probably incorrect?
        },

        // Looking for string start
        EXPECT_QUOTE => if "\"" == buf {
            buf_action(|| IN_FILENAME_STRING)
        } else {
            EXPECT_QUOTE // ignore other chars
        },

        IN_FILENAME_STRING => if "\"" == buf {
            buf_action(|| EXPECT_SEMI)
        } else {
            t.appendToFileName(c.unwrap());
            IN_FILENAME_STRING
        }

        // End of lex
        EXPECT_SEMI => if ":" == buf {break} else {fail!("Expected semi")},

        _ => fail!("Unexpected character: %c", c)
    };

此外,while true应为loop;但实际上,循环应写成:

for input.iter().advance |c| {
    buf.push_char(c);
    state = match state { ... }
}

小点:

  • Option<~FileData>let mut t = ~Token();Option<FileData>let mut t = Token();。这些分配是不必要的。

  • lowercase_with_underscores似乎是Rust命名约定。

  • 编译器可以通过Eq自动创建#[deriving(Eq)] enum LexState { ... } impl。 (在tutorialmanual中详细介绍。)

  • 在可能的情况下避免分配是惯用的,这包括在input中使用slices (s.slice(byte_start, byte_end))而不是将字符推送到buf;即,为当前令牌记录start索引,并通过将此索引设置为当前索引来“清除”缓冲区;但是,实施起来可能有点棘手。