Xmlparse Unix Manual Page




NAME

     xmlparse - a validating XML parser


SYNOPSIS

     xmlparse [-c <config filename>] [-C <SGML catalog filename>]
     [ - d  <dirname>]  [ - E  <max errors>] [-f] [-h] [-l <debug
     level>] [-m <message catalog>] [-n] [-s] [-v] <filenames  or
     URIs...>

     xmlparse [-h]


DESCRIPTION

     Xmlparse is a full validating XML parser for use as a  back-
     end  to  Web-based  XML validation systems, or as a general-
     purpose XML validation tool.  It is particularly well-suited
     to  legacy  SGML  documents that are in the process of being
     converted, along with their associated DTDs, to XML.

     Xmlparse knows the difference between SGML and XML, and  can
     often  elucidate mistakes that stem from SGML/XML incompati-
     bilities (e.g., it reminds users that SDATA  entities  don't
     exist  in XML; it warns users about nondeterministic content
     models, which are illegal in SGML;  it  also  flags  general
     problems  like  declared  but  not  used,  and  used but not
     declared, elements in DTDs).


OPTIONS

     Xmlparse may be invoked with  several  command-line  options
     that  tell  it where to send error output, and where to look
     for catalog, message, and other auxiliary files.

     Normally environment variables and/or compile-time  defaults
     should   provide  reasonable  fallbacks  for  all  of  these
     command-line run-time options.

     -c filename
          Use filename  as  the  configuration  file.   See  also
          option  - d  below.   Do  not  leave  this  file world-
          writable.

     -C filenames
          Use filenames as the SGML catalog files (if  more  than
          one  is  given, separate them with a colon).  Note that
          if -C is not supplied on the command line, the value of
          the  SGML_CATALOG_FILES  environment  variable  is used
          instead.

     -d directory
          Use  directory  as  the  default  location  for   data,
          library, and configuration files.

     -E max errors
          Print no more than max errors  errors  and/or  warnings
          for every file parsed.

     - f    Force  undefined  attributes  and  element  names  in
          namespaces to validate OK

     -h    Print a brief help message, then exit.  See  also  - v
          below.

     -l level
          Set debugging level to level (must be an integer from 0
          to  7;  higher = more information).  Debugging messages
          go to syslog(3) (facility DAEMON, priority DEBUG).  Cf.
          system messages, which go to syslog only when specified
          (see -s below).  This switch only works if  the  system
          administrator left debugging enabled at compile time.

     -m filename
          Use filename as the message file name.  This file  con-
          tains  all error, warning, and parsing messages emitted
          by xmlparse  at  run-time.   Do  not  leave  this  file
          world-writable.

     -n    Resolve only remote http:, urn:, and ftp:  system  ids
          (be  certain  to  use  this  option  if you are running
          xmlparse as a back end to a web-based validator).  Note
          that,  even  with  the  - n option, xmlparse will still
          resolve local files if supplied  on  the  command-line.
          It  will  not,  however,  resolve  URIs  given  on  the
          command-line unless they begin  with  http:,  urn:,  or
          ftp:.

     -s    Output system error and warning messages to  syslog(3)
          (facility  DAEMON,  priority  ERR  or  WARNING).  These
          error messages cover things like malformed SGML catalog
          files, missing system files, and so on.  Debugging mes-
          sages (see -l  above)  always  go  to  syslog  (DAEMON,
          DEBUG).  Parsing errors always go to stderr.

     -v    Print version number, then exit.  See also -h above.


CONFIGURATION FILE

     Run-time  settings  may  be  supplied,  not   only   through
     command-line  options, but also through a system-wide confi-
     guration       file       (usually       installed        as
     /usr/local/lib/xmlparse/xmlparse.cfg).  Where they coincide,
     directives  supplied  in  the  configuration  file  override
     command-line options and compile-time defaults.

     Normally the configuration file is  used  only  to  set  the
     external   FPI  and/or  URI  resolution  commands  (used  by
     xmlparse to resolve PUBLIC and SYSTEM identifiers).  It  may
     also  be used, however, to override the command-line options
     -C, -E, -l, -m, -n, and -s.  All configuration  file  direc-
     tives are fully documented in the sample configuration file,
     xmlparse.cfg, included with the base xmlparse source distri-
     bution.


DIAGNOSTICS

     If no validation errors are detected,  xmlparse  exits  with
     status  0.   Warnings  may  be  issued to stderr.  If actual
     errors are detected, xmlparse exits with status 4, and emits
     a  list  of parsing errors/warnings to stderr.  Fatal system
     errors resulting in early program termination produce  other
     non-zero terminations.

     Xmlparse may emit various diagnostic  messages  at  run-time
     about  missing  files or arguments.  By default, these go to
     stderr.  They  may,  however,  be  redirected  to  syslog(3)
     through the -s command-line switch (on which, see above).

     Xmlparse  is  aggressive  in  reporting  ambiguous   content
     models,  elements that are declared but not used in any con-
     tent model, unresolvable public and system identifiers,  and
     so on.

     Xmlparse also issues warning  messages  that  encourage  DTD
     writers  to  declare things before using them.  For example,
     it reports cases  where  ATTLIST  declarations  name  as-yet
     undeclared  elements; it also flags unparsed entity declara-
     tions that point to as-yet undeclared NOTATIONs.


CONFORMANCE

     Xmlparse implements the published (February  1998)  XML  1.0
     standard.   It will also check namespaces (see, however, the
     -f option above).

     Xmlparse deviates from the 1.0 spec in one notable way: That
     it  ignores  syntactically  meaningless whitespace inside of
     declarations and markup.  The rationale here  is  that  this
     practice   not   only   follows  SGML  (e.g.,  Handbook,  65
     [371:16]), but also simplifies processing - and renders  XML
     more easily manageable using programming tools like flex(1).
     Note that this deviation from the spec  has  nothing  to  do
     with the hotly debated issue of whitespace in actual charac-
     ter data (which the validator maintains internally,  as  per
     the spec).

     Xmlparse also deviates from the strict 1.0 standard  in  its
     early  reporting of malformed entity replacement text (if an
     entity's replacement text would be malformed, xmlparse flags
     it,  whether  or  not  you  actually  use  the entity).  The
     rationale here is that early reporting of  malformed  entity
     replacement text prevents users from declaring entities that
     are at best useless, and  at  worst  harmful  in  that  they
     trigger  DTD-based  errors  in  documents  whose  DTDs  were
     thought to be correct.

     Xmlparse does not prohibit '<'  in  attribute  values.   The
     rationale  in  this  instance is that excluding '<' actually
     complicates processing for validating parsers.   Also,  with
     all  its intricate entity replacement rules and constraints,
     XML is already such a pain to process  that  this  so-called
     DPH restriction is just plain silly.

     A final area in which xmlparse deviates  from  the  XML  1.0
     spec  is  that  it  ignores  the encoding types specified by
     external  transfer  protocols,  such  as  HTTP.   Experience
     reveals  that  these  protocols very often provide incorrect
     encoding information (e.g., UTF-8 usually gets sent as  ISO-
     8859-1  or  plain-text  ASCII).   As  a practical necessity,
     therefore, xmlparse relies for encoding information  on  its
     own  internal charset detection facilities and on the encod-
     ing declaration, if the text provides one.


INSTALLATION

     To set up Xmlparse follow the instructions  in  the  INSTALL
     file that came with the source distribution.  These instruc-
     tions cover source code configuration and building, as  well
     as the actual installing.

     Xmlparse has been  coded  specifically  for  platforms  that
     still  lack  support for UCS-2/4, UTF-16, and Unicode (i.e.,
     nearly all stock Unix systems).  It can  also  make  limited
     use  of legacy SGML catalog files (basically it ignores com-
     ments and lines that don't start with PUBLIC).

     Xmlparse compiles using stock GNU tools available for nearly
     all POSIX systems (e.g., (G)CC, Bison, and Flex [patched for
     Unicode support]).


SEE ALSO

     nsgmls(1)


LIMITATIONS, BUGS

     Xmlparse is an ugly, inelegant piece of  software  built  to
     run  on  legacy POSIX systems with C libraries and compilers
     that don't understand Unicode (i.e., nearly all Unix systems
     out there today).

     Xmlparse assumes that all auxiliary files,  other  than  the
     XML  source files and DTDs, are encoded using straight ASCII
     or UTF-8.  This includes the message  catalog,  the  system-
     wide   configuration  file,  and  any  SGML  catalogs  used.
     Xmlparse will parse XML source files and DTDs that use  UTF-
     8,  UTF-16,  UCS-2/4  (big  or little-endian), or any of the
     ISO-8859 standards, although all messages it emits are  con-
     verted  to UTF-8.  Naturally, documents that don't use UTF-8
     should provide an encoding declaration, since xmlparse  will
     otherwise  assume  the  default,  UTF-8  (as  per the spec).
     Documents  using  ISO  8859-x  should  include  an  encoding
     declaration as well.

     Xmlparse handles memory inefficiently.  This inefficiency is
     compounded  by its internal use of the wchar_t data type (if
     available) for character and string operations.

     Xmlparse also emits geekly line-numbered error messages that
     XML/SGML neophytes may find inscrutable.  These messages are
     kept in a simple sprintf catalog that  hard  codes  argument
     orderings,  and  will  therefore  be  a pain to port to some
     language environments.


AUTHOR

     Xmlparse was written  by  Richard  Goerwitz  for  the  Brown
     University Scholarly Technology Group.

     Send bug reports to <STG_info@Brown.EDU>.


COPYRIGHT

     Copyright 1998 by Richard Goerwitz and Brown University

     Xmlparse is  free  software.   Use  it  if  you  like  (with
     appropriate  acknowledgments)  and  modify  it  to suit your
     needs.  But don't blame us if it doesn't do what you want or
     expect.   Make  sure  to  check the COPYRIGHT file that came
     with the xmlparse source distribution for a  full  statement
     of copyright and usage conditions.





















Man(1) output converted with man2html