XML (eXtensible Markup Language) is a markup language designed for storing and transporting structured data in a human-readable and machine-processable format. Unlike HTML, which defines how data should be displayed, XML defines what data is -- it is purely about data structure and semantics.
XML is both human-readable and machine-readable, making it ideal for data interchange between systems. It is platform-independent, language-independent, and self-describing, meaning the tag names convey the meaning of the data they contain.
Here is a simple XML document representing a book:
<?xml version="1.0" encoding="UTF-8"?>
<book>
<title>The Pragmatic Programmer</title>
<authors>
<author>Andrew Hunt</author>
<author>David Thomas</author>
</authors>
<isbn>978-0135957059</isbn>
<published>2019-09-13</published>
<pages>352</pages>
</book>
This XML is self-explanatory: it describes a book with a title, authors, ISBN, publication date, and page count. Any system that understands XML can parse this structure regardless of programming language or platform.
For XML to be "well-formed," it must follow strict syntax rules. XML parsers reject documents that violate these rules, making XML highly reliable for data exchange.
Every XML document must have exactly one root element that contains all other elements:
<!-- Valid: single root element -->
<library>
<book>...</book>
<book>...</book>
</library>
<!-- Invalid: multiple root elements -->
<book>...</book>
<book>...</book>
Opening and closing tags must match exactly in case:
<!-- Valid -->
<title>Hello World</title>
<!-- Invalid: case mismatch -->
<title>Hello World</Title>
Every opening tag must have a corresponding closing tag, or use self-closing syntax:
<!-- Valid: closing tag -->
<name>John Doe</name>
<!-- Valid: self-closing -->
<linebreak />
<!-- Invalid: unclosed tag -->
<name>John Doe
Tags must be closed in the reverse order they were opened:
<!-- Valid: proper nesting -->
<parent>
<child>content</child>
</parent>
<!-- Invalid: improper nesting -->
<parent>
<child>content</parent>
</child>
Attribute values must be enclosed in single or double quotes:
<!-- Valid -->
<book isbn="978-0135957059" format='hardcover'>
<!-- Invalid: unquoted attributes -->
<book isbn=978-0135957059 format=hardcover>
Five characters have special meaning in XML and must be escaped as entities:
| Character | Entity | Context |
|---|---|---|
| < | < | Always (starts a tag) |
| > | > | In content (optional) |
| & | & | Always (starts an entity) |
| " | " | In double-quoted attributes |
| ' | ' | In single-quoted attributes |
<message>Use <tag> for HTML & XML</message>
<quote text="He said "Hello"" />
The XML declaration specifies the XML version and character encoding:
<?xml version="1.0" encoding="UTF-8"?>
While XML parsers ignore whitespace between elements, human-readable formatting is essential for maintainability. Well-formatted XML follows consistent conventions that make the document structure immediately apparent.
Indent child elements to show hierarchy. Most projects use 2 or 4 spaces:
<!-- 2-space indentation (compact) -->
<order>
<customer>
<name>John Smith</name>
<email>john@example.com</email>
</customer>
<items>
<item quantity="2">Widget</item>
</items>
</order>
<!-- 4-space indentation (more readable for deep nesting) -->
<order>
<customer>
<name>John Smith</name>
<email>john@example.com</email>
</customer>
</order>
Each element should typically appear on its own line for clarity:
<!-- Readable: separate lines -->
<person>
<firstName>Jane</firstName>
<lastName>Doe</lastName>
</person>
<!-- Acceptable for simple elements -->
<person><name>Jane Doe</name></person>
<!-- Unreadable: complex structure on one line -->
<person><firstName>Jane</firstName><lastName>Doe</lastName><age>30</age></person>
For elements with many attributes, consider placing each on its own line:
<rect
x="10"
y="20"
width="100"
height="50"
fill="#326CE5"
stroke="#000000"
stroke-width="2" />
Element names should be descriptive and follow a consistent naming convention:
<!-- Good: descriptive names -->
<customer>
<firstName>John</firstName>
<dateOfBirth>1990-01-15</dateOfBirth>
</customer>
<!-- Bad: cryptic abbreviations -->
<cust>
<fn>John</fn>
<dob>1990-01-15</dob>
</cust>
Use elements for data that has structure or might expand. Use attributes for metadata or simple values:
<!-- Elements for complex data -->
<book isbn="978-0135957059">
<title>The Pragmatic Programmer</title>
<price currency="USD">32.99</price>
</book>
<!-- Attributes for metadata -->
<event date="2026-03-12" time="14:00" timezone="UTC">
<name>Product Launch</name>
</event>
XML validation ensures that a document conforms to a predefined structure and set of rules. While well-formed XML follows syntax rules, valid XML conforms to a schema that defines allowed elements, their order, and data types.
| Aspect | Well-Formed | Valid |
|---|---|---|
| Syntax rules | Must follow XML syntax | Must follow XML syntax + schema |
| Schema required | No | Yes (XSD or DTD) |
| Element names | Any valid names | Only names defined in schema |
| Data types | All text | Typed (string, int, date, etc.) |
| Structure | Any nesting | Must match schema rules |
DTDs are the original XML validation mechanism. They define element structure, attributes, and entities:
<!-- DTD definition -->
<!DOCTYPE library [
<!ELEMENT library (book+)>
<!ELEMENT book (title, author, year)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT author (#PCDATA)>
<!ELEMENT year (#PCDATA)>
<!ATTLIST book isbn CDATA #REQUIRED>
]>
<!-- Valid XML document -->
<library>
<book isbn="978-0135957059">
<title>The Pragmatic Programmer</title>
<author>Andrew Hunt</author>
<year>2019</year>
</book>
</library>
DTDs are simple but limited: no namespace support, no data types beyond text, and limited reusability.
XSD is the modern, more powerful alternative to DTDs. It supports data types, namespaces, and is written in XML syntax:
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="library">
<xs:complexType>
<xs:sequence>
<xs:element name="book" maxOccurs="unbounded">
<xs:complexType>
<xs:sequence>
<xs:element name="title" type="xs:string"/>
<xs:element name="author" type="xs:string"/>
<xs:element name="year" type="xs:integer"/>
<xs:element name="price" type="xs:decimal"/>
</xs:sequence>
<xs:attribute name="isbn" type="xs:string" use="required"/>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
XSD provides strong typing (integer, decimal, date, etc.), pattern matching with regex, cardinality constraints (min/maxOccurs), and reusable type definitions.
Most XML parsers support validation. Here is how to validate in different environments:
<!-- Python with lxml -->
from lxml import etree
schema = etree.XMLSchema(etree.parse('schema.xsd'))
doc = etree.parse('document.xml')
is_valid = schema.validate(doc)
# JavaScript (Node.js) with libxmljs
const libxml = require('libxmljs');
const xsd = libxml.parseXml(schemaString);
const xml = libxml.parseXml(xmlString);
const isValid = xml.validate(xsd);
# Java with built-in validation
SchemaFactory factory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
Schema schema = factory.newSchema(new File("schema.xsd"));
Validator validator = schema.newValidator();
validator.validate(new StreamSource(new File("document.xml")));
XML namespaces solve the problem of name conflicts when combining XML vocabularies from different sources. They provide a way to uniquely identify elements and attributes using URI-based identifiers.
Consider this scenario: you want to combine HTML and SVG in the same document. Both have a <title> element with different meanings:
<!-- Ambiguous: which title is which? -->
<document>
<title>Page Title</title> <!-- HTML title -->
<svg>
<title>Chart Title</title> <!-- SVG title -->
</svg>
</document>
Namespaces are declared using the xmlns attribute and are associated with a prefix:
<?xml version="1.0" encoding="UTF-8"?>
<document
xmlns:html="http://www.w3.org/1999/xhtml"
xmlns:svg="http://www.w3.org/2000/svg">
<html:title>Page Title</html:title>
<svg:svg>
<svg:title>Chart Title</svg:title>
<svg:rect x="0" y="0" width="100" height="50"/>
</svg:svg>
</document>
Now the two title elements are unambiguous: one is html:title and the other is svg:title.
You can declare a default namespace (no prefix) for convenience:
<book xmlns="http://example.com/library">
<title>The Pragmatic Programmer</title>
<author>Andrew Hunt</author>
</book>
All elements without a prefix belong to the default namespace.
| Prefix | Namespace URI | Purpose |
|---|---|---|
| xs/xsd | http://www.w3.org/2001/XMLSchema | XML Schema definitions |
| xsl | http://www.w3.org/1999/XSL/Transform | XSLT transformations |
| svg | http://www.w3.org/2000/svg | SVG graphics |
| soap | http://schemas.xmlsoap.org/soap/envelope/ | SOAP web services |
| atom | http://www.w3.org/2005/Atom | Atom feeds |
XML Schema Definition (XSD) is a powerful language for describing the structure, content, and semantics of XML documents. Understanding XSD is essential for working with enterprise XML systems, web services, and configuration files.
XSD distinguishes between simple types (text-only content, like strings and numbers) and complex types (elements with child elements or attributes):
<!-- Simple type: text content only -->
<xs:element name="firstName" type="xs:string"/>
<xs:element name="age" type="xs:integer"/>
<xs:element name="price" type="xs:decimal"/>
<!-- Complex type: has child elements -->
<xs:element name="person">
<xs:complexType>
<xs:sequence>
<xs:element name="firstName" type="xs:string"/>
<xs:element name="lastName" type="xs:string"/>
<xs:element name="age" type="xs:integer"/>
</xs:sequence>
</xs:complexType>
</xs:element>
XSD provides 44 built-in primitive and derived types:
Control how many times an element can appear using minOccurs and maxOccurs:
<xs:element name="authors">
<xs:complexType>
<xs:sequence>
<!-- At least 1 author, unlimited maximum -->
<xs:element name="author" type="xs:string"
minOccurs="1" maxOccurs="unbounded"/>
<!-- Optional editor (0 or 1) -->
<xs:element name="editor" type="xs:string"
minOccurs="0" maxOccurs="1"/>
</xs:sequence>
</xs:complexType>
</xs:element>
Restrict values using regex patterns:
<xs:simpleType name="EmailType">
<xs:restriction base="xs:string">
<xs:pattern value="[^@]+@[^@]+\.[^@]+"/>
</xs:restriction>
</xs:simpleType>
<xs:simpleType name="PhoneType">
<xs:restriction base="xs:string">
<xs:pattern value="\d{3}-\d{3}-\d{4}"/>
</xs:restriction>
</xs:simpleType>
Define a closed set of allowed values:
<xs:simpleType name="StatusType">
<xs:restriction base="xs:string">
<xs:enumeration value="pending"/>
<xs:enumeration value="approved"/>
<xs:enumeration value="rejected"/>
</xs:restriction>
</xs:simpleType>
XPath and XSLT are powerful technologies for querying and transforming XML documents.
XPath provides a syntax for navigating XML documents and selecting nodes:
<!-- Sample XML -->
<library>
<book id="1">
<title>Book One</title>
<price>29.99</price>
</book>
<book id="2">
<title>Book Two</title>
<price>39.99</price>
</book>
</library>
<!-- XPath examples -->
/library/book # All book elements
/library/book[1] # First book
/library/book[@id='2'] # Book with id=2
/library/book/title # All title elements
//title # All title elements (any depth)
/library/book[price > 30] # Books over $30
XSLT transforms XML documents into other formats (HTML, text, different XML):
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<html>
<body>
<h1>Book Catalog</h1>
<ul>
<xsl:for-each select="library/book">
<li>
<xsl:value-of select="title"/> -
$<xsl:value-of select="price"/>
</li>
</xsl:for-each>
</ul>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
XML parsers are strict and will reject malformed documents. Here are the most common errors and how to fix them.
<!-- Error -->
<name>John Doe
<!-- Fix -->
<name>John Doe</name>
<!-- Error -->
<firstName>John</FirstName>
<!-- Fix -->
<firstName>John</firstName>
<!-- Error -->
<b><i>text</b></i>
<!-- Fix -->
<b><i>text</i></b>
<!-- Error -->
<message>Price: $50 < $100</message>
<!-- Fix -->
<message>Price: $50 < $100</message>
<!-- Error -->
<book>...</book>
<book>...</book>
<!-- Fix -->
<library>
<book>...</book>
<book>...</book>
</library>
Follow these guidelines for maintainable, interoperable XML documents.
<?xml version="1.0" encoding="UTF-8"?>
Define an XSD schema for any XML format you create. This ensures consistency and catches errors early.
Use attributes for metadata, elements for data that has structure or might expand in the future.
Namespaces prevent conflicts and allow different XML vocabularies to coexist.
Use proper indentation and meaningful element names. Minified XML is hard to debug.
Excessive nesting makes XML hard to read and process. Consider flattening structures or using references.
Our free XML Formatter & Validator tool lets you format and validate XML directly in your browser. No data is sent to any server -- all processing happens locally on your machine.
Paste unformatted or minified XML and get beautifully indented output with proper line breaks and spacing. Perfect for reading XML from APIs, logs, or minified sources.
Instantly check if your XML is well-formed. The tool identifies syntax errors with line and column numbers, making it easy to locate and fix issues.
Stop wrestling with unformatted XML. Use our free tool to format, beautify, and validate XML documents -- right in your browser, with zero data sent to any server.
Try the XML Formatter NowMaster JSON syntax, formatting best practices, validation techniques, and common parsing errors.
Learn the differences between JSON and YAML, when to use each, and how to convert between them.
Master Kubernetes YAML from Deployments and Services to advanced scheduling and security contexts.