我正在尝试将双引号之间的内容计算为作业的一个标记。
例如: “hello world”= 1个令牌 “hello”“world”= 3个令牌(因为空格计为1个令牌)
我创建了main.cpp,并将“scanQuotesAsString”代码添加到3个模块:
现在,“你好世界”扫描一个2个令牌,而不是跳过这个空间。如果我添加(或skipspace,那么常规输入,如| hello world |不带引号也会跳过空格。
我认为我的问题出在scanner.ck.c,其中最后几个函数是:
/*
* Private method: scanToEndOfIdentifier
* Usage: finish = scanToEndOfIdentifier();
* ----------------------------------------
* This function advances the position of the scanner until it
* reaches the end of a sequence of letters or digits that make
* up an identifier. The return value is the index of the last
* character in the identifier; the value of the stored index
* cp is the first character after that.
*/
int Scanner::scanToEndOfIdentifier() {
while (cp < len && isalnum(buffer[cp])) {
if ((stringOption == ScanQuotesAsStrings) && (buffer[cp] == '"'))
break;
cp++;
}
return cp - 1;
}
/* Private functions */
/*
* Private method: scanQuotedString
* Usage: scanQuotedString();
* -------------------
* This function advances the position of the scanner until the
* current character is a double quotation mark
*/
void Scanner::scanQuotedString() {
while ((cp < len && (buffer[cp] == '"')) || (cp < len && (buffer[cp] == '"'))){
cp++;
}
这是main.cc
#include "genlib.h"
#include "simpio.h"
#include "scanner.h"
#include <iostream>
/* Private function prototypes */
int CountTokens(string str);
int main() {
cout << "Please enter a sentence: ";
string str = GetLine();
int num = CountTokens(str);
cout << "You entered " << num << " tokens." << endl;
return 0;
}
int CountTokens(string str) {
int count = 0;
Scanner scanner; // create new scanner object
scanner.setInput(str); // initialize the input to be scanned
//scanner.setSpaceOption(Scanner::PreserveSpaces);
scanner.setStringOption(Scanner::ScanQuotesAsStrings);
while (scanner.hasMoreTokens()) { // read tokens from the scanner
scanner.nextToken();
count++;
}
return count;
}
这是scanner.cc
/*
* File: scanner.cpp
* -----------------
* Implementation for the simplified Scanner class.
*/
#include "genlib.h"
#include "scanner.h"
#include <cctype>
#include <iostream>
/*
* The details of the representation are inaccessible to the client,
* but consist of the following fields:
*
* buffer -- String passed to setInput
* len -- Length of buffer, saved for efficiency
* cp -- Current character position in the buffer
* spaceOption -- Setting of the space option extension
*/
Scanner::Scanner() {
buffer = "";
spaceOption = PreserveSpaces;
}
Scanner::~Scanner() {
/* Empty */
}
void Scanner::setInput(string str) {
buffer = str;
len = buffer.length();
cp = 0;
}
/*
* Implementation notes: nextToken
* -------------------------------
* The code for nextToken follows from the definition of a token.
*/
string Scanner::nextToken() {
if (cp == -1) {
Error("setInput has not been called");
}
if (stringOption == ScanQuotesAsStrings) scanQuotedString();
if (spaceOption == IgnoreSpaces) skipSpaces();
int start = cp;
if (start >= len) return "";
if (isalnum(buffer[cp])) {
int finish = scanToEndOfIdentifier();
return buffer.substr(start, finish - start + 1);
}
cp++;
return buffer.substr(start, 1);
}
bool Scanner::hasMoreTokens() {
if (cp == -1) {
Error("setInput has not been called");
}
if (stringOption == ScanQuotesAsStrings) scanQuotedString();
if (spaceOption == IgnoreSpaces) skipSpaces();
return (cp < len);
}
void Scanner::setSpaceOption(spaceOptionT option) {
spaceOption = option;
}
Scanner::spaceOptionT Scanner::getSpaceOption() {
return spaceOption;
}
void Scanner::setStringOption(stringOptionT option) {
stringOption = option;
}
Scanner::stringOptionT Scanner::getStringOption() {
return stringOption;
}
/* Private functions */
/*
* Private method: skipSpaces
* Usage: skipSpaces();
* -------------------
* This function advances the position of the scanner until the
* current character is not a whitespace character.
*/
void Scanner::skipSpaces() {
while (cp < len && isspace(buffer[cp])) {
cp++;
}
}
/*
* Private method: scanToEndOfIdentifier
* Usage: finish = scanToEndOfIdentifier();
* ----------------------------------------
* This function advances the position of the scanner until it
* reaches the end of a sequence of letters or digits that make
* up an identifier. The return value is the index of the last
* character in the identifier; the value of the stored index
* cp is the first character after that.
*/
int Scanner::scanToEndOfIdentifier() {
while (cp < len && isalnum(buffer[cp])) {
if ((stringOption == ScanQuotesAsStrings) && (buffer[cp] == '"'))
break;
cp++;
}
return cp - 1;
}
/* Private functions */
/*
* Private method: scanQuotedString
* Usage: scanQuotedString();
* -------------------
* This function advances the position of the scanner until the
* current character is a double quotation mark
*/
void Scanner::scanQuotedString() {
while ((cp < len && (buffer[cp] == '"')) || (cp < len && (buffer[cp] == '"'))){
cp++;
}
scanner.h
/*
* File: scanner.h
* ---------------
* This file is the interface for a class that facilitates dividing
* a string into logical units called "tokens", which are either
*
* 1. Strings of consecutive letters and digits representing words
* 2. One-character strings representing punctuation or separators
*
* To use this class, you must first create an instance of a
* Scanner object by declaring
*
* Scanner scanner;
*
* You initialize the scanner's input stream by calling
*
* scanner.setInput(str);
*
* where str is the string from which tokens should be read.
* Once you have done so, you can then retrieve the next token
* by making the following call:
*
* token = scanner.nextToken();
*
* To determine whether any tokens remain to be read, you can call
* the predicate method scanner.hasMoreTokens(). The nextToken
* method returns the empty string after the last token is read.
*
* The following code fragment serves as an idiom for processing
* each token in the string inputString:
*
* Scanner scanner;
* scanner.setInput(inputString);
* while (scanner.hasMoreTokens()) {
* string token = scanner.nextToken();
* . . . process the token . . .
* }
*
* This version of the Scanner class includes an option for skipping
* whitespace characters, which is described in the comments for the
* setSpaceOption method.
*/
#ifndef _scanner_h
#define _scanner_h
#include "genlib.h"
/*
* Class: Scanner
* --------------
* This class is used to represent a single instance of a scanner.
*/
class Scanner {
public:
/*
* Constructor: Scanner
* Usage: Scanner scanner;
* -----------------------
* The constructor initializes a new scanner object. The scanner
* starts empty, with no input to scan.
*/
Scanner();
/*
* Destructor: ~Scanner
* Usage: usually implicit
* -----------------------
* The destructor deallocates any memory associated with this scanner.
*/
~Scanner();
/*
* Method: setInput
* Usage: scanner.setInput(str);
* -----------------------------
* This method configures this scanner to start extracting
* tokens from the input string str. Any previous input string is
* discarded.
*/
void setInput(string str);
/*
* Method: nextToken
* Usage: token = scanner.nextToken();
* -----------------------------------
* This method returns the next token from this scanner. If
* nextToken is called when no tokens are available, it returns the
* empty string.
*/
string nextToken();
/*
* Method: hasMoreTokens
* Usage: if (scanner.hasMoreTokens()) . . .
* ------------------------------------------
* This method returns true as long as there are additional
* tokens for this scanner to read.
*/
bool hasMoreTokens();
/*
* Methods: setSpaceOption, getSpaceOption
* Usage: scanner.setSpaceOption(option);
* option = scanner.getSpaceOption();
* ------------------------------------------
* This method controls whether this scanner
* ignores whitespace characters or treats them as valid tokens.
* By default, the nextToken function treats whitespace characters,
* such as spaces and tabs, just like any other punctuation mark.
* If, however, you call
*
* scanner.setSpaceOption(Scanner::IgnoreSpaces);
*
* the scanner will skip over any white space before reading a
* token. You can restore the original behavior by calling
*
* scanner.setSpaceOption(Scanner::PreserveSpaces);
*
* The getSpaceOption function returns the current setting
* of this option.
*/
enum spaceOptionT { PreserveSpaces, IgnoreSpaces };
void setSpaceOption(spaceOptionT option);
spaceOptionT getSpaceOption();
/*
* Methods: setStringOption, getStringOption
* Usage: scanner.setStringOption(option);
* option = scanner.getStringOption();
* --------------------------------------------------
* This method controls how the scanner reads double quotation marks
* as input. The default is set to treat quotes just like any other
* punctuation character:
* scanner.setStringOption(Scanner::ScanQuotesAsPunctuation);
*
* Otherwise, the option:
* scanner.setStringOption(Scanner::ScanQuotesAsStrings);
*
* the token starting with a quotation mark will be scanned until
* another quotation mark is found (closing quotation). Therefore
* the entire string within the quotation, including both quotation
* marks counts as 1 token.
*/
enum stringOptionT { ScanQuotesAsPunctuation, ScanQuotesAsStrings };
void setStringOption(stringOptionT option);
stringOptionT getStringOption();
private:
#include "scanpriv.h"
};
#endif
**,最后是scanpriv.h **
/*
* File: scanpriv.h
* ----------------
* This file contains the private data for the simplified version
* of the Scanner class.
*/
/* Instance variables */
string buffer; /* The string containing the tokens */
int len; /* The buffer length, for efficiency */
int cp; /* The current index in the buffer */
spaceOptionT spaceOption; /* Setting of the space option */
stringOptionT stringOption;
/* Private method prototypes */
void skipSpaces();
int scanToEndOfIdentifier();
void scanQuotedString();
答案 0 :(得分:3)
渴望阅读。
解析引用文本的两种方法:
一个简单的开关,告诉您当前是否在引号中,并激活一些特殊的报价处理。这基本上等同于#1),只是内联。
放开状态并编写单独的规则来扫描引用的文本。代码实际上非常简单(C ++启发的p代码):
// assume we are one behind the opening quotation mark
for (c : text) {
if (is_escape (*c)) { // to support stuff like "foo's name is \"bar\""
p = peek(c);
if (!is_valid_escape_character (peek (c))) error;
else {
make the peeked character (*p) part of the result;
++c;
}
}
else if (is_quotation_mark (*c))
{
return the result; // we approached the end of the string
}
else if (!is_valid_character (*c))
{
error; // maybe you want to forbid literal control characters
}
else
{
make *c part of the result
}
}
error; // reached end of input before closing quotation mark
如果您不希望支持转义字符,则代码变得更简单:
// assume we are one behind the opening quotation mark
for (c : text) {
if (is_quotation_mark (*c))
return the result;
else if (!is_valid_character (*c))
error;
else
make *c part of the result
}
error; // reached end of input before closing quotation mark
您不应该忽略检查它是否是无效字符,因为这会邀请用户利用您的代码并可能使用您程序的未定义行为。
答案 1 :(得分:0)
快速浏览一下代码:如果您处于ScanQuotesAsStrings
模式,则除了引用的字符串之外,您不需要其他令牌;相反,差异应该是当您看到以'"'
开头的令牌时,您会转到单独的子扫描仪。
在伪代码中(使用C ++“end iterator是一个过去的结尾”成语):
current_token.begin = cursor;
current_token.end = current_token.begin + 1;
if(scan_quotes_as_strings && *current_token.begin == '"') {
while(*current_token.end && *current_token.end != '"')
++current_token.end;
return;
}
while(*current_token.end && *current_token.end != ' ')
++current_token.end;
您可以通过引入状态变量将这两个循环合并为一个循环,而不是用不同的代码路径表示扫描程序状态。
此外,
while ((cp < len && (buffer[cp] == '"')) || (cp < len && (buffer[cp] == '"'))) ...
看起来很腥。