Memory Usage for Large GDSII Files

Why Don't We Always Load Entity Data into Memory?

Given that reading the entity data from RAM is much faster than reading from disk one might ask "Why would we not always load the entity data into RAM? Who doesn't want to go faster?"

The answer is - when you don't have enough RAM! You'll actually go slower!

In the 40GB example, the scan/quad data fills about 7.5GB of RAM and the entity data uses up another 12GB of data (after compression and optimization). This totals 19.5 GB of RAM. If we have a workstation with 32 GB of RAM then everything works fine. What happens if we try to cram 19.5 GB of data into memory on a workstation equipped with only 16 GB of RAM?

Swapping Memory from RAM to Disk

The good news is that modern operating systems don't crash if you try to use more RAM than is in place. In fact, they are designed to deal with multiple programs running concurrently whose total memory requirements exceed the installed RAM. A section of the disk drive is allocated to hold images of the memory and the operating system moves blocks of memory in and out of RAM as it assumes is best for the running programs.

If you have a bunch of programs running simultaneously that are not too memory intensive this swapping works fairly well. You might notice a small hesitation as memory you need when switching applications gets brought back into RAM but overall the experience is satifactory.

But for a large memory intensive and compute intensive program like QISLIB, swapping slows things down enormously. Part of the problem is that the operating system doesn't necessarily know which chunks of memory that QISLIB uses should be swapped and which should never be swapped.

Second, swapping works best where there are a lot of idle programs that are not going to request blocks of memory that have been swapped to disk. But during an explode QISLIB is traversing all over the memory space and swapping results in a lot of inefficient IO (also known as disk thrashing.)

Rule - if loading entity data into memory will cause swapping, don't do it!

How Big a File Can I Load Before Swapping

OK, Everybody agrees that if loading entity data triggers swapping that it should not be done. But can I know how big a GDSII file (for a given workstation) will trigger swapping? That way I'll load entity data on files smaller than the critical size and not load entity data on larger files.

Unfortunately it is very hard to get a definitive answer here for a couple of reasons:

a) other programs need memory too. Unless you are in total control of your machine, other programs may be running or may start up and will need and request memory. So it is possible that you are not in a swapping situation and suddenly end up that way due to actions of others.

b) The size of the quad tree varies quite a lot depending on the nature of the data in the GDSII file. As mentioned earlier, we have measured values from 10 to 20% of the size of the file. So a 40 GB file might need 4GB for the quad tree or it might need 8GB. There is no way to know this in advance and the extra GB's might move you from a no-swap to a swap condition. The same uncertainty applies to the size of the entity data in RAM. While we compress and optimize the GDSII data so that it takes up only about 30% of the disk size in RAM, the amount of compression varies from GDSII file to GDSII file. Again this might account for several GB's for a large GDSII file.

What's worse is that we are dealing with pretty big files that take quite a while to load. If you start with Set_Load_Memory_On and you find youself swapping then you have to quit and reload with Set_Load_Memory_Off. There is no way to dynamically jettison the unwanted entity data on-the-fly.

Page 3 - cache the scan/quad data for faster re-opening of big files ...