++all:

* DSNLEXER::NextTok() now uses two separate modes to parse quoted strings. This gives us the freedom to control our own destiny separate from the constraints put on us by the Specctra DSN spec. * Added Documentation/s-expressions.txt to explain all this. * Enhanced our quoting protocol by moving away from doubling up double quotes to a C line escape mechanism. * Now support multi-line strings, which when properly escaped, can still be read in as a token originating on a single line.

++all:
* DSNLEXER::NextTok() now uses two separate modes to parse quoted strings. This gives us the freedom to control our own destiny separate from the constraints put on us by the Specctra DSN spec. * Added Documentation/s-expressions.txt to explain all this. * Enhanced our quoting protocol by moving away from doubling up double quotes to a C line escape mechanism. * Now support multi-line strings, which when properly escaped, can still be read in as a token originating on a single line.
604edcac · Dick Hollenbeck · 73bdd76a · 604edcac · 604edcac · 604edcac
Commit 604edcac authored Jan 30, 2011 by Dick Hollenbeck
8 changed files
--- a/CHANGELOG.txt
+++ b/CHANGELOG.txt
@@ -4,6 +4,18 @@ KiCad ChangeLog 2010
 Please add newer entries at the top, list the date and your name with
 email address.

+2011-Jan-30 UPDATE Dick Hollenbeck <dick@softplc.com>
+================================================================================
++all:
+  * DSNLEXER::NextTok() now uses two separate modes to parse quoted strings.
+    This gives us the freedom to control our own destiny separate from the
+    constraints put on us by the Specctra DSN spec.
+  * Added Documentation/s-expressions.txt to explain all this.
+  * Enhanced our quoting protocol by moving away from doubling up double quotes
+    to a C line escape mechanism.
+  * Now support multi-line strings, which when properly escaped, can still be
+    read in as a token originating on a single line.
+
 2011-Jan-21 UPDATE Wayne Stambaugh <stambaughw@verizon.net>
 ================================================================================
  EESchema code refactoring and coding policy naming fixes.

--- a/Documentation/s-expressions.txt
+++ b/Documentation/s-expressions.txt
+
+S-Expression Support in Kicad
+============================================================================
+Author:  Dick Hollenbeck
+Date:    Jan 2011
+
+
+An s-expression is a text stream or string, in the same vain as XML, consisting
+of a sequence of elements. Each element is either an atom or list. An atom
+corresponds to a string, while a list corresponds to an s-expression. The
+following grammar represents our definition of an s-expression:
+
+sexpr   ::= ( sx )
+sx      ::= atom sxtail | sexptr sxtail | NULL
+sxtail  ::= sx | NULL
+atom    :: quoted | value
+quoted  :: "ws_string"
+value   :: nws_string
+
+An atom can either be a quoted string, which is a string containing whitespace
+surrounded by double quotes, or a non-whitespace string that does not require
+surrounding quotes.
+
+The s-expression syntax used in Kicad uses two quoting/syntax strategies, given
+by the needs of the Specctra DSN specification and of our own non-specctra
+needs. The Specctra DSN specification is not very clear with regard to quoting
+and on top of that there is Freerouter's interpretation, which would actually
+supercede anything in the Specctra DSN spec anyway, due to a desire to be
+compatible with Freerouter.
+
+We have our own needs, which go beyond those of the Specctra DSN spec, so we
+support the two syntaxes or quoting protocols for quoted atoms:
+
+1) Specctra quoting protocol (specctraMode)
+2) Kicad quoting protocol    (non-specctraMode)
+
+We can control our own destiny better by having a separately defined mode for
+non Specctra DSN files.
+
+To summarize, in specctraMode Freerouter dictates to us what we need to do. In
+non-specctraMode, which can be thought of as Kicad mode, we have our own quoting
+protocol and can make changes without breaking the specctraMode.
+
+There needs to be agreement between how a file is saved, and how a file is read
+back in, in either mode, to fulfill the round-tripping requirements. A file
+written using one mode may not necessarily be readable using the other mode,
+although it might be.  Just don't count on it.
+
+
+In Kicad mode:
+
+OUTPUTFORMATTER::Quoted() is the tool to wrap s-expression atoms.
+DSNLEXER::NexTok() is basically the inverse function, and reads tokens back in.
+These two must agree, so that what is written out comes back in un-altered.
+
+The decision to wrap the string or not is left to the Quoted() function. If the
+string is wrapped, it will also escape internal double quotes, \n's and \r's.
+Any null string is wrapped in quotes, and so is any string which starts with
+'#', so that it is not confused with an s-expression comment.
+
+
+Kicad S-expression Syntax and Quoting Protocol (non-specctraMode):
+==================================================================
+
+*) All Kicad s-expression files are saved using a UTF8 encoding and should
+support any international characters in the atoms. Some atoms are considered
+keywords, and constitute a grammar superimposed on the s-expressions.
+
+*) All keywords are ASCII and lowercase. International characters are not to be
+used here.
+
+*) DSNLEXER::NextTok() requires that any token be on a single line of input. If
+you want to save a multi-line string, Quoted() will automatically escape the \n
+or \r for you and put the output on a single line.  It should round-trip fine.
+
+*) There can be escape sequences in a quoted string only.  Escape sequences allow
+foreign tools to generate byte patterns in the input stream.  C style 2 byte hex
+codes are supported, and so are 3 byte octal escape sequences.  See DSNLEXER::NextTok()
+for the full list of escape sequences, by searching file dsnlexer.cpp for the
+string "ESCAPE SEQUENCES".  Any use of the escape mechanism must still produce
+UTF-8 encoded text after the escape handling is applied.
+
+*) Just because an escape sequence is supported on input, does not mean that
+OUTPUTFORMATTER::Quoted() must generate such an escape sequence for output. For
+example, having true tabs in the s-expression file is OK. So that will not be
+escaped on output. Other similar cases exist.
+
+*) Backslash is the escape byte.
+
--- a/common/dsnlexer.cpp
+++ b/common/dsnlexer.cpp
@@ -54,7 +54,8 @@ void DSNLEXER::init()
    curTok = DSN_NONE;
    stringDelimiter = '"';

-    space_in_quoted_tokens = true;
+    specctraMode = false;
+    space_in_quoted_tokens = false;

    commentsAreTokens = false;
 }
@@ -107,6 +108,20 @@ DSNLEXER::~DSNLEXER()
    }
 }

+void DSNLEXER::SetSpecctraMode( bool aMode )
+{
+    specctraMode = aMode;
+    if( aMode )
+    {
+        // specctra mode defaults, some of which can still be changed in this mode.
+        space_in_quoted_tokens = true;
+    }
+    else
+    {
+        space_in_quoted_tokens = false;
+        stringDelimiter = '"';
+    }
+}

 void DSNLEXER::PushReader( LINE_READER* aLineReader )
 {
@@ -479,74 +494,119 @@ L_read:
            // else it was something like +5V, fall through below
        }

-        // a quoted string
+        // a quoted string, will return DSN_STRING
        if( *cur == stringDelimiter )
        {
-            // New code, understands nested quotes, and deliberately restricts
-            // strings to a single line. Still strips off leading and trailing
-            // quotes, and now removes internal doubled up quotes
-#if 1
-            head = cur;
+            // Non-specctraMode, understands and deciphers escaped \, \r, \n, and \".
+            // Strips off leading and trailing double quotes
+            if( !specctraMode )
+            {
+                // copy the token, character by character so we can remove doubled up quotes.
+                curText.clear();

-            // copy the token, character by character so we can remove doubled up quotes.
-            curText.clear();
+                ++cur;  // skip over the leading delimiter, which is always " in non-specctraMode

-            while( head < limit )
-            {
-                if( *head==stringDelimiter )
+                head = cur;
+
+                while( head<limit )
                {
-                    if( head+1<limit && head[1]==stringDelimiter )
+                    // ESCAPE SEQUENCES:
+                    if( *head =='\\' )
                    {
-                        // include only one of the doubled-up stringDelimiters
-                        curText += *head;
-                        head    += 2;
-                        continue;
+                        char    tbuf[8];
+                        char    c;
+                        int     i;
+
+                        if( ++head >= limit )
+                            break;  // throw exception at L_unterminated
+
+                        switch( *head++ )
+                        {
+                        case '"':
+                        case '\\':  c = head[-1];   break;
+                        case 'a':   c = '\x07';     break;
+                        case 'b':   c = '\x08';     break;
+                        case 'f':   c = '\x0c';     break;
+                        case 'n':   c = '\n';       break;
+                        case 'r':   c = '\r';       break;
+                        case 't':   c = '\x09';     break;
+                        case 'v':   c = '\x0b';     break;
+
+                        case 'x':   // 1 or 2 byte hex escape sequence
+                            for( i=0; i<2; ++i )
+                            {
+                                if( !isxdigit( head[i] ) )
+                                    break;
+                                tbuf[i] = head[i];
+                            }
+                            tbuf[i] = '\0';
+                            if( i > 0 )
+                                c = (char) strtoul( tbuf, NULL, 16 );
+                            else
+                                c = 'x';   // a goofed hex escape sequence, interpret as 'x'
+                            head += i;
+                            break;
+
+                        default:    // 1-3 byte octal escape sequence
+                            --head;
+                            for( i=0; i<3; ++i )
+                            {
+                                if( head[i] < '0' || head[i] > '7' )
+                                    break;
+                                tbuf[i] = head[i];
+                            }
+                            tbuf[i] = '\0';
+                            if( i > 0 )
+                                c = (char) strtoul( tbuf, NULL, 8 );
+                            else
+                                c = '\\';   // a goofed octal escape sequence, interpret as '\'
+                            head += i;
+                            break;
+                        }
+
+                        curText += c;
                    }
-                    else if( head == cur )
+
+                    else if( *head == '"' )     // end of the non-specctraMode DSN_STRING
                    {
-                        ++head;     // skip the leading quote
-                        continue;
+                        curTok = DSN_STRING;
+                        ++head;                 // omit this trailing double quote
+                        goto exit;
                    }

-                    // fall thru
-                }
+                    else
+                        curText += *head++;

-                // check for a terminator
-                if( isStringTerminator( *head ) )
-                {
-                    curTok = DSN_STRING;
-                    ++head;
-                    goto exit;
-                }
+                }   // while

-                curText += *head++;
+                // L_unterminated:
+                wxString errtxt(_("Un-terminated delimited string") );
+                THROW_PARSE_ERROR( errtxt, CurSource(), CurLine(), CurLineNumber(), CurOffset() );
            }

-            wxString errtxt(_("Un-terminated delimited string") );
-            THROW_PARSE_ERROR( errtxt, CurSource(), CurLine(), CurLineNumber(), CurOffset() );
-
-#else   // old code, did not understand nested quotes
-            ++cur;  // skip over the leading delimiter: ",', or $
+            else    // specctraMode DSN_STRING
+            {
+                ++cur;  // skip over the leading delimiter: ",', or $

-            head = cur;
+                head = cur;

-            while( head<limit  &&  !isStringTerminator( *head ) )
-                ++head;
+                while( head<limit  &&  !isStringTerminator( *head ) )
+                    ++head;

-            if( head >= limit )
-            {
-                wxString errtxt(_("Un-terminated delimited string") );
-                THROW_PARSE_ERROR( errtxt, CurSource(), CurLine(), CurLineNumber(), CurOffset() );
-            }
+                if( head >= limit )
+                {
+                    wxString errtxt(_("Un-terminated delimited string") );
+                    THROW_PARSE_ERROR( errtxt, CurSource(), CurLine(), CurLineNumber(), CurOffset() );
+                }

-            curText.clear();
-            curText.append( cur, head );
+                curText.clear();
+                curText.append( cur, head );

-            ++head;     // skip over the trailing delimiter
+                ++head;     // skip over the trailing delimiter

-            curTok  = DSN_STRING;
-            goto exit;
-#endif
+                curTok  = DSN_STRING;
+                goto exit;
+            }
        }

        // Maybe it is a token we will find in the token table.
@@ -1413,7 +1473,6 @@ static const KEYWORD keywords[] = {

 class DSNTEST : public wxApp
 {
-
    DSNLEXER*   lexer;
    int         nestLevel;


--- a/common/richio.cpp
+++ b/common/richio.cpp
@@ -282,43 +282,50 @@ int OUTPUTFORMATTER::Print( int nestLevel, const char* fmt, ... ) throw( IO_ERRO

 std::string OUTPUTFORMATTER::Quoted( const std::string& aWrapee ) throw( IO_ERROR )
 {
-    // derived class's notion of what a quote character is
-    char quote          = *GetQuoteChar( "(" );
+    static const char quoteThese[] = "\t ()\n\r";

-    // Will the string be wrapped based on its interior content?
-    const char* squote  = GetQuoteChar( aWrapee.c_str() );
+    if( !aWrapee.size() ||  // quote null string as ""
+        aWrapee[0]=='#' ||  // quote a potential s-expression comment, so it is not a comment
+        aWrapee[0]=='"' ||  // NextTok() will travel through DSN_STRING path anyway, then must apply escapes
+        aWrapee.find_first_of( quoteThese ) != std::string::npos )
+    {
+        std::string ret;

-    std::string wrapee  = aWrapee;  // return this
+        ret.reserve( aWrapee.size()*2 + 2 );

-    // Search the interior of the string for 'quote' chars
-    // and replace them as found with duplicated quotes.
-    // Note that necessarily any string which has internal quotes will
-    // also be wrapped in quotes later in this function.
-    for( unsigned i=0;  i<wrapee.size();  ++i )
-    {
-        if( wrapee[i] == quote )
-        {
-            wrapee.insert( wrapee.begin()+i, quote );
-            ++i;
-        }
-        else if( wrapee[i]=='\r' || wrapee[i]=='\n' )
+        ret += '"';
+
+        for( std::string::const_iterator it = aWrapee.begin(); it!=aWrapee.end(); ++it )
        {
-            // In a desire to maintain accurate line number reporting within DSNLEXER
-            // a decision was made to make all S-expression strings be on a single
-            // line.  You can embed \n (human readable) in the text but not
-            // '\n' which is 0x0a.
-            THROW_IO_ERROR( _( "S-expression string has newline" ) );
+            switch( *it )
+            {
+            case '\n':
+                ret += '\\';
+                ret += 'n';
+                break;
+            case '\r':
+                ret += '\\';
+                ret += 'n';
+                break;
+            case '\\':
+                ret += '\\';
+                ret += '\\';
+                break;
+            case '"':
+                ret += '\\';
+                ret += '"';
+                break;
+            default:
+                ret += *it;
+            }
        }
-    }

-    if( *squote || strchr( wrapee.c_str(), quote ) )
-    {
-        // wrap the beginning and end of the string in a quote.
-        wrapee.insert( wrapee.begin(), quote );
-        wrapee.insert( wrapee.end(), quote );
+        ret += '"';
+
+        return ret;
    }

-    return wrapee;
+    return aWrapee;
 }



--- a/include/dsnlexer.h
+++ b/include/dsnlexer.h
@@ -89,8 +89,16 @@ protected:

    READER_STACK        readerStack;            ///< all the LINE_READERs by pointer.
    LINE_READER*        reader;                 ///< no ownership. ownership is via readerStack, maybe, if iOwnReaders
-    int                 stringDelimiter;
+
+    bool                specctraMode;           ///< if true, then:
+                                                ///< 1) stringDelimiter can be changed
+                                                ///< 2) Kicad quoting protocol is not in effect
+                                                ///< 3) space_in_quoted_tokens is functional
+                                                ///< else not.
+
+    char                stringDelimiter;
    bool                space_in_quoted_tokens; ///< blank spaces within quoted strings
+
    bool                commentsAreTokens;      ///< true if should return comments as tokens

    int                 prevTok;        ///< curTok from previous NextTok() call.
@@ -205,6 +213,20 @@ public:

    virtual ~DSNLEXER();

+    /**
+     * Function SetSpecctraMode
+     * changes the behavior of this lexer into or out of "specctra mode".  If
+     * specctra mode, then:
+     * 1) stringDelimiter can be changed
+     * 2) Kicad quoting protocol is not in effect
+     * 3) space_in_quoted_tokens is functional
+     * else none of the above are true.  The default mode is non-specctra mode, meaning:
+     * 1) stringDelimiter cannot be changed
+     * 2) Kicad quoting protocol is in effect
+     * 3) space_in_quoted_tokens is not functional
+     */
+    void SetSpecctraMode( bool aMode );
+
    /**
     * Function PushReader
     * manages a stack of LINE_READERs in order to handle nested file inclusion.
@@ -298,10 +320,11 @@ public:
     * @param aStringDelimiter The character in lowest 8 bits.
     * @return int - The old delimiter in the lowest 8 bits.
     */
-    int SetStringDelimiter( int aStringDelimiter )
+    char SetStringDelimiter( char aStringDelimiter )
    {
        int old = stringDelimiter;
-        stringDelimiter = aStringDelimiter;
+        if( specctraMode )
+            stringDelimiter = aStringDelimiter;
        return old;
    }

@@ -314,7 +337,8 @@ public:
    bool SetSpaceInQuotedTokens( bool val )
    {
        bool old = space_in_quoted_tokens;
-        space_in_quoted_tokens = val;
+        if( specctraMode )
+            space_in_quoted_tokens = val;
        return old;
    }


--- a/pcbnew/ioascii.cpp
+++ b/pcbnew/ioascii.cpp
@@ -327,7 +327,7 @@ int WinEDA_BasePcbFrame::ReadSetup( LINE_READER* aReader )
            }
            catch( IO_ERROR& e )
            {
-#if 0
+#if 1
                wxString msg;
                msg.Printf( wxT( "Error reading PcbPlotParams from %s:\n%s" ),
                            aReader->GetSource().GetData(),

--- a/pcbnew/specctra.cpp
+++ b/pcbnew/specctra.cpp
@@ -2650,7 +2650,7 @@ void SPECCTRA_DB::doFROMTO( FROMTO* growth ) throw( IO_ERROR )
    // split apart the <pin_reference>s into 3 separate tokens.  Do this by
    // turning off the string delimiter in the lexer.

-    int old = SetStringDelimiter( 0 );
+    char old = SetStringDelimiter( 0 );

    if( !IsSymbol(NextTok() ) )
    {

--- a/pcbnew/specctra.h
+++ b/pcbnew/specctra.h
@@ -3798,6 +3798,8 @@ public:
        session = 0;
        quote_char += '"';
        modulesAreFlipped = false;
+
+        SetSpecctraMode( true );
    }

    virtual ~SPECCTRA_DB()