Clip and Bitmap Extraction from Very Large GDSII Files
Artwork is regularly asked to estimate how "fast" one of our programs can process large GDSII (or OASIS) files for jobs such as extracting thousands of small windows from a layout file and either rasterizing them or extracting the polygons -- sometimes the output is left in memory for another application to use and other times the output is written to disk.
Can't Get the Real Files
Unfortunately, because of the secrecy surrounding modern chip design, we are not provided with representative files in order to make direct measurements. We generally have to either synthesize a dummy file or use some files that are so old that nobody worries about letting them out of the foundry. A file that is 5-10 years old does not reflect the density, complexity and size of the real files our clients want to process.
Don't Have the Big Iron
Further, our clients often utilize hardware that is much larger than what we have in house. For example, our development machines typically have 4 or 6 cores with up to 32 GB of RAM. Our clients' machines have anywhere from 16 to 64 CPUs with up to 256 GB of RAM.
Nevertheless, one can run a number of judicious tests using smaller layout files on lesser hardware and, with some detailed understanding of how various computations scale, come up with a reasonably good estimate of how our software will perform on larger files running on bigger machines.
The Software - NexGenRaster
The software used for window extraction is called NexGenRaster (or sometimes NexGen RIP). It calls and manages three internal libraries.
QISLIB - scans the GDSII or OASIS file. Creates a quad tree which enables fast access to the data based on its location in the layout. Imports the entity data and saves it in memory in a more efficient format than native GDSII. These are all "one time" computations.
It also includes the exploder and polygon extractor that responds to a request for window data by returning polygons to the call back function (specified by NexGenRaster).
ACSRASTERLIB - Artwork's third generation rasterizer. Multi-threaded. Reads the polygons from the buffer, rasterizes them and leaves the raw bitmap results in memory.
RSTRFMTLIB - reads the raw bitmap from memory and compresses/formats it as specified. It can then write the bitmap to disk.
So How Fast Can You Extract ...
The question we want to answer today:
Q. From a multi-layered hierarchical GDSII file (not OPC corrected) which has an maximum density of 2000 polygons per square um, how fast can you extract a large number of 10 x 10 um bitmaps? The GDSII files range in size from 50GB to 500GB. We have a machine with up to 20 CPUs available and up to 192GB of RAM with large SSD for I/O.
Our Test Files
We don't have any "real" 50GB GDSII files on hand and what files we do have are older technology with densities far below 2000 polygons per square um. Here's what we do have:
We will measure these files directly and then try to identify a relationship between the size/complexity of the file and the processing time in order to extrapolate performance to much larger files.
Breaking Down the Timing Contributions
Since we will be extrapolating from our measured times, we need to understand each internal processing block and how it contributes to the overall time to extract and rasterize a window. Here's a step by step breakdown starting with opening the file, then extracting a window, and finally rasterizing it.
Step 1 - Scan the File
When directed to open a file, the QISLIB module must initially scan the entire file and build an internal database containing a table of a cell definitions, the extents of each cell, structure references and array references, hierarchy, layers with data ... Since GDSII files are not ordered one cannot complete the hierarchy structure or even compute cell extents until reaching the end of the file.
The results of the scan are placed into memory. Generally the amount of memory needed is not a significant percent of the computer's available RAM.
Scanning is single threaded and therefore has two main limitations - disk I/O and the CPU's clock speed. Disk I/O can be maximized by using high performance SSDs and also by connecting them with modern wide band interfaces (PCIe instead of SATA)
The scanning step is only performed once but the speed is affected by all layers that are present -- even if all layers are not going to be used.
Step 2 - Quad Tree and DB Load
A quad tree is a way of accelerating the access to geometries based on their physical location. It takes some time to compute a quad tree (in our case we are computing an array of quad trees) but the time invested up front gets paid back when multiple windows are extracted.
There is a tradeoff between the granularity of a quad tree and extraction performance. Creating a highly granular quad tree takes more computation and requires more memory to store, but the extractor will get the needed polygons in a small window much faster.
A user set parameter controls how granular the quad tree will be. Generally, the larger the GDSII file the less granularity can be requested due to memory limitations.
While in principle one could store the quad tree on disk, this would hurt performance. Therefore our assumption will be that the quad tree will always be stored in memory.
The quad tree's size in memory cannot be accurately estimated in advance. We've found that it tends to range from 20% to 35% of the GDSII file's size.
Rather than "point" directly into the GDSII file, it is much more effective when retrieving the geometry data to create a more efficient and tighter packed database. We call this process Load to Memory and it is done during the same IO pass as the building of the quad tree. The resulting database is called DBLoad and the original GDSII is no longer needed.
If there is enough available memory, the DBLoad database should be held in memory. This will enable the extractor function to get the desired boundary/path data as fast as possible. For very large input files there may not be enough memory to accommodate the geometry database. In that case, it can be placed on disk; disk IO will then affect the rate at which the extractor can collect the needed geometric entities.
Both the quad tree computations and DBLoad operations are single threaded; since they require a second read of the GDSII file from disk, this step is both IO limited and CPU clock speed limited. The number of computations and the memory footprint are dependent on how many layers are imported - the user can greatly reduce the computations of this step if importing only a single layer.
Step 3 - Polygon Explosion/Extraction
Our first two steps are one time operations. The polygon exploder/extractor is called once for every window needed. If 10,000 windows need to be extracted the bulk of the time is spent in this module.
The extractor gets a request for data crossing a window; uses the quad tree to get "pointers" to the SREF's, AREF's and geometry database and "explodes" the hierarchy in order to get to the polygons crossing the window. Each flattened polygon is returned to the callback function which is then added to the polygon buffer for this particular window.
This is currently a single threaded process; in the optimal case, there is no disk IO so the main limitation is CPU performance. [Artwork is working on a second generation exploder/extractor that is multi-threaded.]
Step 4 - Rasterize
At this point we have a chunk of memory holding all the polygons in our requested window. If the goal is to provide a bitmap then the rasterizer library reads the polygons and produces a bitmap in memory.
The ACSRASTERLIB is multi-threaded so more than one instance can work concurrently on the same set of input polygons. This makes sense if the window and DPI are large. However if the windows are relatively small it may be more efficient to use only a single rasterizer instance.
The throughput here depends both on CPU performance and possibly the number of concurrent CPUs assigned to the task.
Step 5 - Format and Compress
The raster library places a raw bitmap in memory. The formatter/compressor's function is to take this raw bitmap, compress it (if requested), format it per the desired output format and write to disk. This library is also multi-threaded.
Depending on our client's application, this part of the NexGenRIP may not even be needed. Many calling applications will take the raw bitmap directly from the rasterizer's buffer and act on it.
However if the client's application is expecting a large number of files on disk, then this module takes care of that.
If a large number of files are written to disk then disk I/O can easily become the limiting factor in this step.
OK, now that we understand the internal steps we can start reporting some timing.