Generating UUIDs (GUIDs)

This document provides recommendations for generating UUIDs (Universally unique identifier), also known as GUID (globally unique identifier).

UUID Expression Design

When the UUID to be generated should be used with interconnected systems (e.g. Integrations), there are some standards to follow. For details see https://en.wikipedia.org/wiki/Universally_unique_identifier. Even when using the UUID only internally (only one system produce it and consume it), the design is important, as every bit not used properly (rand() function for UUID V4) doubles the probability of generating duplicate UUID, the main purpose of why UUID exists.

The de-facto industry standard today (2022) is UUID V4, Variant: 2 (ISO/IEC).The main characteristic of v4 UUID is that it should be able to generate billion unique UUIDs per second for about 85 years without collision (without generating the same UUID more than once).UUID v4 is 128 bits long, with 6 specific bits set to specific values to indicate the version/variant, and 122 random bits.

For example, if you need to generate UUID V4, Variant: 2 (the de-facto industry standard today, 2022), you could use the following expression: 

/* spec and validity checker in https://www.uuidtools.com/decode */
fn!lower(joinarray(
  {
	dec2hex(tointeger(rand(4) * 256), 2),  	/*8 hex digits*/
	"-",
	dec2hex(tointeger(rand(2) * 256), 2),  	/*4 hex digits*/
	"-4",         	   			            /*version 4 = completely random*/
	dec2hex(rand(3) * 256, 1), 		        /*3 hex digits*/
	"-",
	bin2hex("10" & dec2bin(rand(1) * 4, 2),1), /*1 hex digit = Variant: 1st 2 bits=10=[ISO/IEC] + 2 rand bits*/
	dec2hex(rand(3) * 256, 1), 		        /*3 hex digits*/
	"-",
	dec2hex(tointeger(rand(6) * 256), 2),  	/*12 hex digits*/
  })),

The main building blocks are described below: 

  1. The rand(n) generates List of Numbers (Floating Point) in range 0-1, with n elements.
  2. tointeger() strips all the numbers behind the decimal point and only keeps the integer part, so we must first move the decimals part to the integer part. We do it with the multiplication (effectively doing a bitwise shift left).E.g. tointeger( rand(1)*4) gives us a 2 bit integer random number, or random numbers from 0-3. *8 gives us 3 bits etc, or *256 which gives us 8 bits, or 2 hexadecimal letters.
  3. dec2hex converts the number to hexadecimal format, just as the standard requires. One hexadecimal letter is 4 bits, 0-16 in decimal, or 0-f in hexadecimal. The 2nd parameter of dec2hex specifies how many letters we want to generate. (It will prepend 0s to reach that length).
  4. As some specific bits must be set to follow the standard, we need to use bin2hex and work on binary level, but the principle is the same as with dec2hex.
  5. The dashes (-) are inserted at specific positions to conform to the standard formatting of 8-4-4-4-12 characters.
  6. Joinarray() concatenates all the 23 elements in the array to one string.
  7. The resulting string must be in lowercase to follow the standard.

Applying Design to Requirements

There can be varying requirements for generating unique IDs and there may be cases where it is appropriate to deviate from the above design patterns. If you use another design, be wary of:

  • Performance
    • Where possible avoid the use of looping functions like foreach() or apply(), or calling functions multiple times unnecessarily
  • Ensuring that the functions you use in your expression will re-evaluate when called multiple times
    • Many functions will not re-evaluate when called multiple times in the same instance without a new input. rand() is an exception while now() is expected to return the same date-time within an instance of the expression.

Funny fact

For those wondering why we have such strange formatting of 8-4-4-4-12, check v1 UUID definition from 1980s . 8=time_low, 4=time_mid, 4=time_hi_and_version, 4=clock_seq_hi_and_res clock_seq_low, 12=node (MAC address)By now, we have completely repurposed that, but because of compatibility and reusability, the format stayed.The collision possibility in v1 versus v4 is massively different, mostly because of introducing the probability theory (which is is very unintuitive) to IT.