For over 25 years GDSII has been the industry standard database for IC layout. While other formats have been proposed to replace it (and one, OASIS, seems to be gaining some traction) GDSII remains by far the main way of describing the physical layout for the masks used to build a chip.
This is despite the fact that GDSII is not an open industry standard -- it was developed by Calma in the 80's and the ownership of the specification moved from Calma to GE to Valid to Cadence over the years. If you ask the right person you might even be able to get a copy of the GDSII specification from Cadence, though I have been working with a copy of the spec that I Xeroxed® back in 1989.
Improvements and Enhancements since 1989?
To my knowledge there has been no official updating of the GDSII specification since 1989. As you can imagine, computers and processors have come a long way since that time; the specification has some constraints which were probably based both on the limitations of the 80's era computers and also on the developer's beliefs that chips were going to be approximately the same complexity for the foreseeable future.
Fortunately, the way the database was created enabled users to extend many of the dejure limitations while maintaining defacto compatibility with the actual architectural underpinnings.
To some extent, GDSII's long life is due to both its elegant architecture and simplicity. The elegant architecture enables it to support today's modern chips with their billions of polygons while the simplicity enabled programmers to write code to manipulate GDSII and do things with it that the developers could not have imagined. This enormous base of legacy code is probably what is slowing the transition to alternatives (in particular OASIS.)
If you had to describe GDSII in only a few words these might be good ones to use:
GDSII is an integer database. The basic unit of measurement is a nanometer (10-9 meter)
Since four byte signed integers are used to describe a coordinate then the integer coordinates can range from from minus 231 to plus 231-1. (two's complement)
GDSII is organized in a hierarchical fashion. That is to say, that a number of elements are grouped into a cell or structure, and then that structure is referenced (or instanced or placed) multiple times. Since digital IC's are extremely repetitive, the database matches the physical layout approach very efficiently.
Cells can be nested with no limitation as to how deep the nesting goes (though I have yet to see nesting more than 9 levels deep.)
It is this nesting and hierarchy that allow one to describe an IC with one billion polygons using a database on the order of 5 GBytes ...
Unfortunately, when one needs to compute the actual position of the polygonal entities, one must "reverse" this nesting; for large databases this turns out to be a difficult computation to do quickly.
The database is binary for compactness. This means that any software for reading or writing GDSII has to be able to extract each byte and interpret the bits.
There is no official ASCII equivalent to the binary format. Various companies (including Artwork) have developed their own gdsii binary-ascii converters for those who wish to use tools such as Perl, awk or Python to manipulate GDSII data.
GDSII is divided into "records." Only a few record types make up the great majority of the GDSII data. I've listed the essential ones here:
Records used once at the beginning of a GDSII file. The record number is shown in brackets.
Records used to begin/end a structure
Records of Entities Within a Structure
BOUNDARY [LAYER , DATATYPE , XY]
PATH [ LAYER, DATATYPE, PATHTYPE, WIDTH, XY ]
TEXT[LAYER, DATATYPE, TEXTTYPE, PRESENTATION, STRING, XY ]
Records of References Within of a Structure
SREF  [ STRANS, MAG, ANGLE, XY ]
AREF  [ STRANS, MAG, ANGLE, COLROW, XY]