Benchmark with Logical Layer Operations

Benchmark with Logical Layer Operations

March 28, 2017

For this benchmark we attempted to provide some timing data for a client who wishes to merge multiple single layer GDSII files into a single file, then perform some logical layer operations and then clip out windows from the "derived" layer.


Creating the Single Layer GDSII Files

We start with a multi-layer 1.4 GByte GDSII file, caw_test1.gds, and extract 3 single layer files (using GDSFILT):


GDSII
File
Data on
Layer
Text
Filter
Keep
Hierarchy
File
Size
6.GDS 6 Y Y 42.7 MB
17.GDS 17 Y Y 30.6 MB
30.GDS 30 Y Y 64.9 MB

We then merge 6.GDS, 17.GDS and 30.GDS into a single file called cawtest1_mod.gds using the Overlay program. This file will have data on 3 layers.


Layer Logical Operations

For this benchmark we are supposing that our client has asked us to perform the following layer operations prior to extracting a large number of clips:


6 [OR] 17 = TEMP
TEMP [MINUS] 30 = 100

The [OR] operation can be done in two ways. The fastest with no computation is to simply change all of the layer attributes in both input layers to a single layer attribute. No unionization between overlapping polygons is performed. Why? Because the next operation, [MINUS], requires a UNION operation so doing one now would be redundant

.

The Hardware

The machine used to run this benchmark:

CPU Intel Core i7-3930K @3.2GHz
RAM 32 GB
DISK - Intel SSD
Cores: 6 physical cores - hyperthreading is possible
OS Windows 7 SP1 64 bit


The ClipExtract Command Line

clipextract64.exe Executable (Windows)
+input:caw_test1_mod.gds Our input file which we generated by merging three single layer GDSII files. It contains only data on layers 6,17,30. We ignore any text and load the file to memory.
+outdir:out Output directory where the clips will be written
+format:POLYS this options extracts polygons defined by the clipping window, but we don't write them to disk in order to remove IO time from our measurements.
-thrnum:<thread-scheme> Controls threads allocated to the exploder and to the Boolean. We will vary the allocation in order to determine the "best" distribution of available threads.
@window:CWH:lsynth:wins.txt 1000 20x20um windows listed in a file called wins.txt using center point as the reference co-ordinates.
"-lsynth:100:0=6,17-30" Synthesize 100:0 by aggregating layers 6 and 17 (no union) and then subtracting layer 30.
-clip Clip to the window extents.
-log:lsynth<thread-scheme>.log Generate a log file.
-silent No messages on console (stdout/stderr)

Timing

We ran the benchmark over and over with different thread allocations. The two main computational operations are Explosion (Extract) and Boolean. The Explosion involves identifying and extracting all the polygons that cross a given clipping window. The Boolean performs the layer logical operations and clipping. By changing the allocation between exploder threads and Boolean threads we can find the optimal distribution.

Thread-Scheme
exploder,Boolean
1,1 1,6 1,12 2,6 4,4 6,6 6,1 12,1 12,12
Strategy Single
Thread
Boolean Heavy Balanced Exploder Heavy Full
File Load (s) 1.53 1.61 1.53 1.62 1.54 1.54 1.61 1.58 1.53
Extract Time (s) 176.09 138.04 149.77 76.88 45.00 38.33 40.50 29.62 43.09
Total Time (s) 177.86 139.88 151.53 78.73 46.77 40.10 42.34 31.45 44.87
Peak Memory (MB) 182 231 277 235 236 260 217 242 321

It is clear that for this particular GDSII file (and window size) we are best off allocating as many threads to the Exploder as possible. That is likely because the number of polygons extracted in each window is small enough that a single Boolean thread handles it quite quickly. We are "extraction" limited in this example.


Polygon Window Counts

The balance between exploder threads and Boolean threads depends, to some extent, on how many polygons/vertices are present in the extracted window. The table below summarizes the counts.

Polygons, Vertices
per Window
Min Average Max Total
Input 0, 0 1,167   9,837 4,048,   25,813 1,167,367   9,837,111
Output 0   0 57   9,777 276   25,338 57,197   9,776,623

The ratio of polygons to vertices is much higher for the output because the MINUS operation results in many small holes in the output polygons which produces higher vertex counts per polygon.



Benchmark 2

In order to measure the effect of file size and to test on a different computer we ran the following benchmark on 3 files:

File Size Num of
Cell Definitions
Layers Logical Operations
caw_test1_mod.gds 144 MB 14,674 6,17,30 (6,17)-30
pan8.gds 765 MB 54,435 12,13,25,26,43,44 (12,13,43,44)-(25,26)
pan9.gds 3.1 GB 67,125 12,13,25,26,43,44 (12,13,43,44)-(25,26)

The Hardware

The machine used to run this benchmark:

CPU Intel Core i7-4770K @3.5GHz
RAM 32 GB
DISK - Intel SSD
Cores: 4 physical cores - hyperthreading is possible
OS Windows 8.1 Pro 64 bit



Timing

  Poly(Vertices)
per Window
Min
Poly(Vertices)
per Window
Avg
Poly(Vertices)
per Window
Max
Exploder,Boolean Threads
vs.
Time to Extract 1000 20 x 20 um Windows
caw_test1 21 (105) 1166(41,966) 4234 (9,842) 1,1
115.19
1,8
92.05
4,4
31.36
8,1
26.14
8,2
25.09
8,8
27.31
pan8 0 (0) 6,950 (41,996) 17,568 (102,504) 1,1
497.95
1,8
370,84
4,4
185.61
8,1
121.42
8,2
132.47
8,8
290.79
pan9 429 (2,145) 11,833 (70,873) 17,568 (102,504) 1,1
1,547.09
1,8
335.03
4,4
196.06
8,1
341.71
8,2
233.39
8,8
183.33


Flat vs. Hierarchical Input Data

Depending on the data source, the input to ClipExtract may be essentially flat. This might result from data in a given layer processed through some sort of OPC (Optical Proximity Corrrection) routine.

In order to determine whether large flat files are problematic, we took the hierarchical data we used in the first Benchmark and exploded it so that data in each layer is flat.

caw_test1_mod.gds Hierarchical Layers 6,17,30 144 MB

Threads
Exploder,Boolean
1,1 1,8 4,4 8,1 8,2 8,8
Extract Time
(sec)
115.16 91.88 31.20 26.09 25.03 27.39
Load Time
(sec)
1.72 1.67 1.67 1.68 1.67 1.69
Peak Memory
MB
169 209 204 208 211 236

caw_test1_mod.gds Flat Layers 6,17,30 6040 MB

Threads
Exploder,Boolean
1,1 1,8 4,4 8,1 8,2 8,8
Extract Time
(sec)
114.87 91.69 31.07 26.05 24.96 27.37
Load Time
(sec)
26.67 26.73 26.77 26.81 27.03 26.75
Peak Memory
MB
3433 3471 3466 3471 3474 3501

Observations

The time needed to extract and Booleanize the 1000 windows is unchanged no matter whether the input data is hierarchical or flat. However the loading time ( a one time occurence) is greatly increased as is the memory footprint.