Character String Types

XSD defines many kinds of character string types including string, normalizedString, and token. All of these XSD types are mapped to an OSXMLSTRING type by default. This internal type represents a UTF-8 character string. The definition of this type in osSysTypes.h is as follows:

   typedef struct OSXMLSTRING {
      OSBOOL cdata;
      const OSUTF8CHAR* value;
   } OSXMLSTRING;

The cdata member of this structure is a flag indicating whether or not the value is to be encoded as an XML CDATA section. The value member is a pointer to the string to be encoded. The underlying C type for the OSUTF8CHAR type is unsigned char. This allows the entire UTF-8 character range to be represented as positive numbers.

If the -static command line parameter is specified, character string types with a maxLength or fixed length facet set will be represented as static arrays of OSUTF8CHAR. In this case, CDATA is not supported. For example,

   typedef OSUTF8CHAR string8_t[(8 * OSUTF8CHAR_SIZE) + 1];
      

where 8 is the maxLength or fixed length value. Macro OSUTF8CHAR_SIZE is defined to be 1 by default in osMacros.h. If the character size in the string is more than 1, this macro must be defined to the largest character size in byte.

For C++, an XML string class is used:

   class EXTRTCLASS OSXMLStringClass : public OSRTBaseType {
    protected:
      OSUTF8CHAR* value;
      OSBOOL cdata;
      ...

    public:
      /**
       * The default constructor creates an empty string.
       */
      OSXMLStringClass();
      ...
   } ;

This class contains constructors and other methods to allow the member variables to be initialized and manipulated.

If -usestl is used with C++ code generation, the XML STL string class is used instead of XML string class:

   class EXTRTCLASS OSXMLSTLStringClass : public OSRTBaseType {
    protected:
      std::string* value;
      OSBOOL cdata;
      ...

    public:
      /**
       * The default constructor creates an empty string.
       */
      OSXMLSTLStringClass();
      ...
   } ;
      

If -use-qt is used with C++ code generation, QString is used instead of OSXMLStringClass.

The data member value in calss OSXMLSTLStringClass is an STL string (the string class from C++ standard template library). To enable using the XML STL string class, HAS_STL needs to be defined.

The general mapping is as follows:

XSD type:

   <xsd:simpleType name="TypeName">
      <restriction base="xsd:string"/>
   </xsd:simpleType>

Generated C code:

   typedef OSXMLSTRING TypeName;

Generated C++ code:

   class TypeName : public OSXMLStringClass {
      ...
   } ;

or, when -usestl is used:

   class TypeName : public OSXMLSTLStringClass {
      ...
   } ;

or, when -use-qt is used:

    class TypeName : public QString {
       ...
    } ;
      

In this case, xsd:string refers to the XSD string base type and all other types that are derived from it. For C, a variable of this type can be populated with a simple string literal cast to a const OSUTF8CHAR* variable as follows:

   TypeName strval;
   strval.cdata = FALSE;
   strval.value = (const OSUTF8CHAR*) "my string";

In the case of C++, the built-in assignment operator can be used to set the string value:

   strval = "my string";

This will set the cdata member to false as above and do a deep-copy of the text into the object.

Note that directly setting the value and cdata members is no longer supported. Use the setValue and setCDATA methods, instead. Code that set these data members directly will not compile against the updated library, even if -compat is specified.

String-based types may be further restricted through the use of facets such as l ength, minLength, maxLength, and pattern. These have no effect on the generated C or C++ type definitions. Constraint checks are added to the generated encoders and decoders to ensure values of the type are within the specified constraint bounds.