The Looseleaf Papers

Entropy of 22-digit USPS tracking numbers

Created

Modified

Published

USPS tracking numbers are 22 digits long. Here are some examples from the USPS website:

Service Sample Number
USPS Tracking® 9400 1000 0000 0000 0000 00
Priority Mail® 9205 5000 0000 0000 0000 00
Certified Mail® 9407 3000 0000 0000 0000 00

That’s about log2(10^22) ≈ 73 bits of entropy, although this is almost certainly an overestimate due to check digits and other restrictions. All the tracking numbers I’ve seen start with nine, for example.

It would be nice if the tracking number could be assigned without any central coordination, so that there is no need to check a central database before assigning a number.

I suspect these are assigned sequentially, with a different prefix for each location and a range of allowed serial numbers. But could they be assigned randomly without fear of collision?

For a version 4 UUID with 122 bits of entropy, we can generate on the order of 1e17 while keeping the probability of collision under 1%:

You have: sqrt(2*2^122 * ln(1/(1-.5)))
You want:
        Definition: 2.7149227e+18
You have: sqrt(2*2^122 * ln(1/(1-.01)))
You want:
        Definition: 3.2691513e+17

That’s using this approximation:

\begin{equation*} \sqrt{\left(2 \cdot 2^{122}\right) \ln\left(\frac{1}{1-p}\right)} \end{equation*}

But even for an optimistic 73 bits, collisions become a problem much sooner:

You have: sqrt(2*2^73 * ln(1/(1-.5)))
You want: billion
    * 114.42543
    / 0.0087393157
You have: sqrt(2*2^73 * ln(1/(1-.01)))
You want: billion
    * 13.778442
    / 0.072577144

Tens of billions may seem like a lot, but the US Postal Service processed 153.9 billion pieces of mail in 2016, though some of it didn’t require a tracking number, such as post cards, letters, and junk mail.

Perhaps, though, it would be OK to assign duplicate numbers as long as the old package had finished being delivered. Most packages take less than a month to be delivered, so this would probably be OK, as long as no more than ten billion or so packages were in circulation at a time. However, sequential or semi-sequential numbering would still be more robust, besides being easier to validate.

This illustrates an example of the difficulty of decentralized systems; centralized coordination, while imperfect, can be significantly more efficient.

But how do USPS tracking numbers actually work?

The closest I’ve been able to find is USPS2000508, Barcode Package Intelligent Mail Specification.

Data Field Field Length
“95” Channel Application Identifier 2 digits
Service Type Code 3 digits
Channel Identifier (to note POS or APC) 1 digit
Device ID 6 digits
Julian Date in YDDD format 4 digits
Serial # 5 digits
Mod 10 Check Digit 1 digit
TOTAL 22 digits

https://postalpro.usps.com/mnt/glusterfs/2018-02/BarcodePackageIMSpec.pdf

Or maybe this?

Data Field Field Length
“92” Channel Application Identifier 2 digits
Service Type Code 3 digits
Mailer ID 9 digits
Serial # 7 digits
Mod 10 Check Digit 1 digit
TOTAL 22 digits

https://postalpro.usps.com/mnt/glusterfs/2018-02/BarcodePackageIMSpec.pdf