我正在将我的一个C ++项目(一个简单的DSL)转换为生锈学习生锈,我遇到嵌套结构和所有权问题。 我很难转换像:
struct FileData {
bool is_utf8;
std::string file_name;
};
class Token {
public:
enum TokenType {
REGULAR,
INCLUDE_FILE,
}
Token() {
_type = REGULAR;
}
Type get_type() const { return _type; }
void beginIncludeFile() {
_type = INCLUDE_FILE;
_include_data = std::unique_ptr<FileData>(new FileData);
}
bool is_utf8() const {
assert(get_type() == INCLUDE_FILE);
return _include_data->is_utf8;
}
void set_utf8(bool value) {
assert(get_type() == INCLUDE_FILE);
_include_data->is_utf8 = value;
}
const std::string& get_file_name() const {
assert(get_type() == INCLUDE_FILE);
return _include_data->file_name;
}
void setFileNameToEmpty() {
assert(get_type() == INCLUDE_FILE);
_include_data->file_name = "";
}
void appendToFileName(char c) {
assert(get_type() == INCLUDE_FILE);
_include_data->file_name += c;
}
FileData* releaseFileData() { return _include_data.release(); }
private:
std::unique_ptr<FileData> _include_data;
TokenType _type;
};
我为此写的锈迹是:
use std::str;
pub struct FileData {
is_utf8 : bool,
file_name : ~str
}
pub fn FileData() -> FileData {
FileData { is_utf8 : true, file_name : ~"" }
}
enum TokenType {
REGULAR,
INCLUDE_FILE
}
pub struct Token {
priv _include_data : Option<~FileData>,
priv _type : TokenType
}
pub fn Token() -> Token {
Token {
_include_data: None,
_type : REGULAR
}
}
impl Token {
pub fn get_type(&self) -> TokenType {
self._type
}
pub fn beginIncludeFile(&mut self) {
self._type = INCLUDE_FILE;
self._include_data = Some(~FileData());
}
pub fn is_utf8(&self) -> bool {
match self._include_data {
Some(ref data) => data.is_utf8,
_ => fail!("No FileData")
}
}
pub fn set_utf8(&mut self, value : bool) {
self._include_data.mutate(|mut data| {
data.is_utf8 = value;
data
});
}
// Return immutable/read-only copy
pub fn get_file_name(&self) -> &~str {
match self._include_data {
Some(ref data) => &data.file_name,
_ => fail!("No FileData")
}
}
pub fn setFileNameToEmpty(&mut self) {
match self._include_data {
Some(ref data) => data.file_name = ~"",
_ => fail!("No FileData")
}
return;
}
pub fn appendToFileName(&mut self, c : char) {
match self._include_data {
Some(ref data) => data.file_name.push_char(c),
_ => fail!("No FileData")
}
return;
}
pub fn getIncludeData(&mut self) -> ~FileData {
match self._include_data {
Some(ref data) => *data,
_ => fail!("No FileData")
}
}
}
enum LexState {
INITIAL,
EXPECT_COLON,
EXPECT_ENCODING,
EXPECT_QUOTE,
IN_FILENAME_STRING,
EXPECT_SEMI
}
impl Eq for LexState {
fn eq(&self, other: &LexState) -> bool {
return (*self as int) == (*other as int);
}
fn ne(&self, other: &LexState) -> bool {
!self.eq(other)
}
}
fn main() {
let mut t = ~Token();
let input = ~"include:utf8 \"file_path/file.foo\";";
let iter = input.iter();
let mut buf : ~str = ~"";
let mut state : LexState = INITIAL;
let buf_action = |action : &fn()| {
buf = ~"";
action();
};
while true {
let c = iter.next();
match c {
None => break,
Some(_c) => buf.push_char(_c)
}
match buf {
// Initial state
~"include" if state == INITIAL => buf_action(|| {
t.beginIncludeFile();
state = EXPECT_COLON;
}),
// Expecting either an encoding, or the start of the file name
~":" if state == EXPECT_COLON => buf_action(|| { state = EXPECT_ENCODING; }),
_ if state == EXPECT_COLON => state = EXPECT_QUOTE, // match WS
// utf8 is the only encoding accepted at the moment
~"utf8" if state == EXPECT_ENCODING => buf_action(|| {
t.set_utf8(true);
state = EXPECT_QUOTE;
}),
_ if state == EXPECT_ENCODING => t.set_utf8(false),
// Looking for string start
~"\"" if state == EXPECT_QUOTE => buf_action(||{ state = IN_FILENAME_STRING; }),
_ if state == EXPECT_QUOTE => (), // ignore other chars
// Reading filename
~"\"" if state == IN_FILENAME_STRING => buf_action(|| {
state = EXPECT_SEMI;
}),
_ if state == IN_FILENAME_STRING => t.appendToFileName(c.unwrap()),
// End of lex
~":" if state == EXPECT_SEMI => break,
_ if state == EXPECT_SEMI => fail!("Expected semi"),
_ => fail!("Unexpected character: " + str::from_char(c.unwrap()))
}
}
return;
}
这种代码的idomatic rust方式是什么?
答案 0 :(得分:5)
Rust与C ++完全不同,直线逐行转换将提供非惯用代码。这不是一个完整的答案,只是一个点点滴滴的集合:
从结构内部返回信息时,将函数写为fn foo<'a>(&'a self) -> &'a SomeInformation
是正常的方法(str
和[]
特别处理):所以
pub fn get_file_name<'a>(&'a self) -> &'a str {
match self._include_data {
Some(ref data) => &data.file_name,
_ => fail!("No FileData")
}
}
pub fn getIncludeData<'a>(&'a self) -> &'a FileData {
match self._include_data {
Some(ref data) => &*data,
_ => fail!("No FileData")
}
}
'a
标记是named lifetime,它将返回值有效的时间与self
对象有效的时间段相关联;这意味着悬空指针是不可能的(忽略编译器错误)。
match
的一系列事物:
match
的完整性,因此将其翻转(匹配state
而非buf
)类型更安全。
match
有一个返回值,因此您可以“神奇地”设置状态。
buf_action
函数是特殊的(我假设它通常会做更多?),它可以被更改,以便buf_action(foo)
被写为clear_buf(); foo
,或者,至少应该返回内部闭包的值,所以
let buf_action = |f| { buf = ~""; f() } // note the lack of semicolon after f
调用函数有一个特殊的糖,其中最后一个参数是函数:do buf_action { some; actions(); here; }
。 (当闭包有参数时,do f |a,b,c| { x; y; z }
。)
state = match state {
// Initial state
INITIAL if "include" == buf => do buf_action {
t.beginIncludeFile();
EXPECT_COLON
},
// Expecting either an encoding, or the start of the file name
EXPECT_COLON => if ":" == buf {
buf_action(|| EXPECT_ENCODING ),
} else {
EXPECT_QUOTE
},
// utf8 is the only encoding accepted at the moment
EXPECT_ENCODING => match buf {
~"utf8" => do buf_action { t.set_utf(true); EXPECT_QUOTE },
_ => { t.set_utf(false); EXPECT_ENCODING } // this is probably incorrect?
},
// Looking for string start
EXPECT_QUOTE => if "\"" == buf {
buf_action(|| IN_FILENAME_STRING)
} else {
EXPECT_QUOTE // ignore other chars
},
IN_FILENAME_STRING => if "\"" == buf {
buf_action(|| EXPECT_SEMI)
} else {
t.appendToFileName(c.unwrap());
IN_FILENAME_STRING
}
// End of lex
EXPECT_SEMI => if ":" == buf {break} else {fail!("Expected semi")},
_ => fail!("Unexpected character: %c", c)
};
此外,while true
应为loop
;但实际上,循环应写成:
for input.iter().advance |c| {
buf.push_char(c);
state = match state { ... }
}
小点:
Option<~FileData>
,let mut t = ~Token();
⇒Option<FileData>
,let mut t = Token();
。这些分配是不必要的。
lowercase_with_underscores
似乎是Rust命名约定。
编译器可以通过Eq
自动创建#[deriving(Eq)] enum LexState { ... }
impl。 (在tutorial和manual中详细介绍。)
在可能的情况下避免分配是惯用的,这包括在input
中使用slices (s.slice(byte_start, byte_end)
)而不是将字符推送到buf
;即,为当前令牌记录start
索引,并通过将此索引设置为当前索引来“清除”缓冲区;但是,实施起来可能有点棘手。