web_page_logo.gif


Opening Large Files

Some GDSII files are so large that they take a long time to open. We discuss here what might be the reasons and what options are available to reduce the scan/load time.


192 GB GDSII File

A user wanted to process a 192 GB GDSII file. After running TIFFExtract he reported the following information:

<Pre OpenGDSII>

QisLib Version: v2.91 (Aug 13, 2014)
SetInuptLayerMap: All-NULL,All-All           <-- not filtering any layers!
SetLoadMemory: 0                             <-- not loading entity data into RAM!
SetTextMode: 0
SetArrayMode: 2
QIS_QT_NUMPERQ: 1024                         <-- minimizes memory footprint of quad tree


<Success On Open>

User Unit: um
Database Unit: 10000 (per um)
Number of Structures: 155                    <-- very few number of structures.
Structure Reference: 309                     <-- very few structure placements.
Array Reference: 41                          <-- very few AREFs.
Boundaries: 3223829551 (3217024929,320016,4) <-- lot of boundaries relative to structures
Boxes: 0 (0,0,0) 
Paths: 0 (0,0)
Vertices: 16133798677
Texts: 0 (--)
Estimated Memory for Load Data: 60.13GB (64559661412 bytes)
Total Memory for Load Data: -- (Load Mode is Off)
Data Dropped: -- (Load Mode is Off)
Scan Time: 42min 23sec (elapsed time)         <-- this is what seems to be the problem. 
Load Time: 38min 12sec (elapsed time)         <-- this is what seems to be the problem. 

The user wanted to know why the total time to scan and load is 80 minutes!


Input/Output Limited?

We do not have all the information we need on the user's hardware but we are assuming he is using a machine with 4-8 cores, 32 GB of RAM and a solid state drive (SSD).


Two Reads Required

For QISLIB to be able to process the GDSII file it has to read from disk the file twice - the first pass scans the file and extracts information about the structure definitions, references, arrays, data with layers and so on. During the second read the program builds and loads a quad tree into memory which enables it to quickly access data in a given location. It may also load the actual entity data into memory (though in this case there is clearly not enough space to support loading the entity data into memory.)


Disk IO Limiting Factor

Even if the CPU were infinitely fast the time needed to read the file from the disk can be significant. So where you actually have the file stored is important.

For maximum IO you want the file located on a SSD directly attached to the computer. This will have a much faster IO than if the file is located on a network shared drive.

Compare These Two Read Rates


Drive Type       Read Rate       Time Computation 
Network Drive    100MB/sec       192GB/100MB/sec = 32.0 minutes
SSD RAID0        500MB/sec       192GB/500MB/sec =  6.4 minutes


Additional Conflicts with OS Caching

You also can have a conflict between reading the file from the SSD and any operating system caching if they both use the same SSD drive. The cheapest solution these days is to purchase 2 additional 250GB SSD's and configure them in RAID 0 which will double the sustained read rate.

Usually the second read of the file goes much faster because the operating system has been able to cache much of the data during the first read. In that case the second read is done partially or completely from memory. In this example of a 192 GB input file, a machine with only 32 GB of RAM cannot cache much of the file so you don't get the improvement on the second read pass that you would see on smaller files.

What to Check?

Check the sustained read rate from the disk where you are storing the GDSII file.

Check to see if the disk holding the GDSII file is also used as the operating systems swap.


Other Possibilities

The other thing we noted is that all layers were loaded. The layer filter string below determines that:
SetInuptLayerMap: All-NULL,All-All

We can't tell from the log file whether this GDSII file has a single layer or many layers. But if you don't need all the layers don't load them all. It won't reduce the time for the first pass (scan) but it will reduce the time for the second pass (load).