Clipextract - Benchmark with Logical Layer Operations

Benchmark with Logical Layer Operations

March 28, 2017

For this benchmark we attempted to provide some timing data for a client who wishes to merge multiple single layer GDSII files into a single file, then perform some logical layer operations and then clip out windows from the "derived" layer.

Creating the Single Layer GDSII Files

We start with a multi-layer 1.4 GByte GDSII file, caw_test1.gds, and extract 3 single layer files (using GDSFILT):

GDSII File	Data on Layer	Text Filter	Keep Hierarchy	File Size
6.GDS	6	Y	Y	42.7 MB
17.GDS	17	Y	Y	30.6 MB
30.GDS	30	Y	Y	64.9 MB

We then merge 6.GDS, 17.GDS and 30.GDS into a single file called cawtest1_mod.gds using the Overlay program. This file will have data on 3 layers.

Layer Logical Operations

For this benchmark we are supposing that our client has asked us to perform the following layer operations prior to extracting a large number of clips:


6 [OR] 17 = TEMP
TEMP [MINUS] 30 = 100

The [OR] operation can be done in two ways. The fastest with no computation is to simply change all of the layer attributes in both input layers to a single layer attribute. No unionization between overlapping polygons is performed. Why? Because the next operation, [MINUS], requires a UNION operation so doing one now would be redundant

The Hardware

The machine used to run this benchmark:

CPU Intel Core i7-3930K @3.2GHz
RAM 32 GB
DISK - Intel SSD
Cores: 6 physical cores - hyperthreading is possible
OS Windows 7 SP1 64 bit

The ClipExtract Command Line

clipextract64.exe	Executable (Windows)
+input:caw_test1_mod.gds	Our input file which we generated by merging three single layer GDSII files. It contains only data on layers 6,17,30. We ignore any text and load the file to memory.
+outdir:out	Output directory where the clips will be written
+format:POLYS	this options extracts polygons defined by the clipping window, but we don't write them to disk in order to remove IO time from our measurements.
-thrnum:<thread-scheme>	Controls threads allocated to the exploder and to the Boolean. We will vary the allocation in order to determine the "best" distribution of available threads.
@window:CWH:lsynth:wins.txt	1000 20x20um windows listed in a file called wins.txt using center point as the reference co-ordinates.
"-lsynth:100:0=6,17-30"	Synthesize 100:0 by aggregating layers 6 and 17 (no union) and then subtracting layer 30.
-clip	Clip to the window extents.
-log:lsynth<thread-scheme>.log	Generate a log file.
-silent	No messages on console (stdout/stderr)

Timing

We ran the benchmark over and over with different thread allocations. The two main computational operations are Explosion (Extract) and Boolean. The Explosion involves identifying and extracting all the polygons that cross a given clipping window. The Boolean performs the layer logical operations and clipping. By changing the allocation between exploder threads and Boolean threads we can find the optimal distribution.

Thread-Scheme exploder,Boolean	1,1	1,6	1,12	2,6	4,4	6,6	6,1	12,1	12,12
Strategy	Single Thread	Boolean Heavy			Balanced		Exploder Heavy		Full
File Load (s)	1.53	1.61	1.53	1.62	1.54	1.54	1.61	1.58	1.53
Extract Time (s)	176.09	138.04	149.77	76.88	45.00	38.33	40.50	29.62	43.09
Total Time (s)	177.86	139.88	151.53	78.73	46.77	40.10	42.34	31.45	44.87
Peak Memory (MB)	182	231	277	235	236	260	217	242	321

It is clear that for this particular GDSII file (and window size) we are best off allocating as many threads to the Exploder as possible. That is likely because the number of polygons extracted in each window is small enough that a single Boolean thread handles it quite quickly. We are "extraction" limited in this example.

Polygon Window Counts

The balance between exploder threads and Boolean threads depends, to some extent, on how many polygons/vertices are present in the extracted window. The table below summarizes the counts.

Polygons, Vertices per Window	Min	Average	Max	Total
Input	0, 0	1,167 9,837	4,048, 25,813	1,167,367 9,837,111
Output	0 0	57 9,777	276 25,338	57,197 9,776,623

The ratio of polygons to vertices is much higher for the output because the MINUS operation results in many small holes in the output polygons which produces higher vertex counts per polygon.

Benchmark 2

In order to measure the effect of file size and to test on a different computer we ran the following benchmark on 3 files:

File	Size	Num of Cell Definitions	Layers	Logical Operations
caw_test1_mod.gds	144 MB	14,674	6,17,30	(6,17)-30
pan8.gds	765 MB	54,435	12,13,25,26,43,44	(12,13,43,44)-(25,26)
pan9.gds	3.1 GB	67,125	12,13,25,26,43,44	(12,13,43,44)-(25,26)

The Hardware

The machine used to run this benchmark:

CPU Intel Core i7-4770K @3.5GHz
RAM 32 GB
DISK - Intel SSD
Cores: 4 physical cores - hyperthreading is possible
OS Windows 8.1 Pro 64 bit

Timing

	Poly(Vertices) per Window Min	Poly(Vertices) per Window Avg	Poly(Vertices) per Window Max	Exploder,Boolean Threads vs. Time to Extract 1000 20 x 20 um Windows
caw_test1	21 (105)	1166(41,966)	4234 (9,842)	1,1 115.19	1,8 92.05	4,4 31.36	8,1 26.14	8,2 25.09	8,8 27.31
pan8	0 (0)	6,950 (41,996)	17,568 (102,504)	1,1 497.95	1,8 370,84	4,4 185.61	8,1 121.42	8,2 132.47	8,8 290.79
pan9	429 (2,145)	11,833 (70,873)	17,568 (102,504)	1,1 1,547.09	1,8 335.03	4,4 196.06	8,1 341.71	8,2 233.39	8,8 183.33

Flat vs. Hierarchical Input Data

Depending on the data source, the input to ClipExtract may be essentially flat. This might result from data in a given layer processed through some sort of OPC (Optical Proximity Corrrection) routine.

In order to determine whether large flat files are problematic, we took the hierarchical data we used in the first Benchmark and exploded it so that data in each layer is flat.

caw_test1_mod.gds Hierarchical Layers 6,17,30 144 MB

Threads Exploder,Boolean	1,1	1,8	4,4	8,1	8,2	8,8
Extract Time (sec)	115.16	91.88	31.20	26.09	25.03	27.39
Load Time (sec)	1.72	1.67	1.67	1.68	1.67	1.69
Peak Memory MB	169	209	204	208	211	236

caw_test1_mod.gds Flat Layers 6,17,30 6040 MB

Threads Exploder,Boolean	1,1	1,8	4,4	8,1	8,2	8,8
Extract Time (sec)	114.87	91.69	31.07	26.05	24.96	27.37
Load Time (sec)	26.67	26.73	26.77	26.81	27.03	26.75
Peak Memory MB	3433	3471	3466	3471	3474	3501

Observations

The time needed to extract and Booleanize the 1000 windows is unchanged no matter whether the input data is hierarchical or flat. However the loading time ( a one time occurence) is greatly increased as is the memory footprint.