PGP Web of Trust Statistics

This is an analysis of the web of trust among users of the market leader in the world of secure email communications, Pretty Good Privacy (PGP). See the PGP Frequently Asked Questions for more information on PGP itself and other related tools.

The PGP community has its own Public Key Infrastructure. It consists of a set of public key servers around the world which allow PGP users to register their keys and publicly sign each other's keys via email and WWW interfaces. This analysis is based on the public keyring obtained from ftp://ftp.uit.no/pub/crypto/pgp/keys/pubring.pgp on 6 Dec 1996. That source is updated daily. A similar analysis of the keyring as of Jan 1 1996 is also available.

Note that many keys and signatures have never been submitted to the public keyservers and are thus not included in this analysis.

It is also important to note that the word "trust" is used very loosly in the phrase "web of trust". Signing another key usually indicates nothing more than a level of confidence in the correlation between the key and the email address of someone who controls the private part of the key. It does not necessarily indicate a level of trust in the individual. Therefore we use, e.g., the term "central signee" rather than "central trustee" below to avoid implying too much.

Overall Statistics

Public keys submitted ('pub'):		34671
Signatures ('sig'):			54764
Total number of unique keys referenced:	37854
Revoked keys:				1442

Self-signed keyIDs:			16499
Other unique keyID-signs-keyID pairs:	31985
Note that the program used for this analysis (pgpstat.java) so far only deals with keyID-keyID relationships, rather than dealing separately with each keyID/UserID pair. This means that if by chance two distinct keys with the same keyID were in the keyring, they would be treated as one, though there are no current examples of that. It does properly ignore revoked keys.

I tried pgp -kc on the keyring (with MIT PGP 2.6.2, for almost 2 days...) and the number of signatures dropped from 28031 to 25810, presumably because some old signatures were thrown out. But the analysis here was done with the original keyring and treats all of the signatures as valid, which means that all versions of PGP will not recognize all the signatures which are accepted here. Feedback on this issue is welcomed.

Note that less than half of the keys are signed by themselves (though the percentage has increased from 38% to 49% in the last 11 months). People should always sign their own UserIDs on their own keys! Hackers can surreptitiously add or change unsigned email addresses in order to encourage correspondents to send email to the wrong place, so never trust a UserID which isn't signed.

(As of January:) Only about 1/3 of the keys are signed by at least one another key, and only 1/6 have 2 or more signatures.

To be a "good PGP-citizen", you should

Strongly-Connected Components

A 'strongly-connected' set of keys is defined as a set in which every key leads to every other key via some chain of signatures (aka signature path). Note that we are not incorporating any PGP-specific rules for establishing trust (e.g. the default CERT_DEPTH of 4, the default requirement for two 'marginally trusted' introducers to establish trust, etc.).
Size of 'strong': largest strongly-connected component:	2047
Size of 'signees': keys signed by 'strong':		4233
Size of 'signers': keys which sign 'strong':		3239
Explanation: the largest strongly-connected component of this keyring (the set names 'strong') has 2047 keys in it. The next-largest strongly-connected component has only 16 keys in it. There are 2186 keys which are directly or indirectly signed by all the keys in 'strong' but which do not sign any key in 'strong' and are thus not in the strongly-connected component, for a total of 4233 keys in the 'signees' set that can be reached from the 'strong' set.

Similarly, there are a total of 3239 keys in the 'signers' set which directly or indirectly sign all the keys in the strong component.

Shortest-Path Distances

Using a breadth-first search of the keyspace, we can calculate the shortest path from one keyID to other keyIDs it has directly or indirectly signed.
Mean distance from strong keyIDs to signees:	5.98188   6.41189
Mean distance to strong keyIDs from signers:	6.25915   6.70961

Maximum shortest-path distance:			21
First, for each key in 'strong', we compute the mean length of the shortest path necessary to reach each key in 'signees': 5.98188. Next we do the converse, following paths of signatures into the strong component rather than out of it. The mean distances are different because a different set of keys is involved: signers vs signees. These average distances are about 7% lower than on 1 Jan of 1996.

Finally, we note that there are several pairs of keys which have a shortest path distance of 21 between them. Here is the example path, between these two keys:

58F36FAD Frederick Fisher Jr. <fredfjr@cybrtyme.com>
104395F1 Arno Hueber <a.hueber@rman2.gun.de>
Anyone who is along this path could improve the tightness of the web of trust by finding someone they know further along the path and carefully signing their key. The further away the better. Signing a key which preceeds yours on the path won't necessarily help, but getting them to sign yours will.

Centers of "Trust": the Central Signee and Central Signer

By examining the shortest-path data more closely we can identify the keys which are closest to the 'center' of the web of trust. The 'central signee' is the key which is signed most directly by others; more specifically the key which has the shortest mean distance from all of the 'signers'. Here is the current "top 10", along with the mean and maximum distances to them from all keys in the 'signers' set:
Mean	Max	KeyID		UserID
3.96203	12	C1B06AF1	Derek Atkins <warlord@MIT.EDU>
4.02624	12	F82CEA91	Simon Cooper <sc@sgi.com>
4.10219	12	53AAF259	Klaus-Peter Kossakowski, DFN-CERT <kpk@cert.dfn.de>
4.11053	13	DA0EDC81	Phil Karn <karn@unix.ka9q.ampr.org>
4.11084	12	8E0A49D1	Wolfgang Ley, DFN-CERT <ley@cert.dfn.de>
4.13152	13	F1A37611	Theodore Y. Ts'o [ENCRYPTION] <tytso@mit.edu>
4.13461	12	D410B7F5	DFN-CERT <dfncert@cert.dfn.de>
4.13739	13	466B4289	Theodore Ts'o [SIGNATURE] <tytso@mit.edu>
4.13924	11	CE766B1F	Paul C. Leyland <pcl@foo.oucs.ox.ac.uk>
4.17752	13	0DBF906D	Jeffrey I. Schiller <jis@mit.edu>
You can also get the full list by mean distance.

And here is the distance to the cental signee from each of the signers along with info on how many keys sign and are signed by each key. This file also has partial information on the 'productivity' of each signature. Finding signatures which are far from the center and have lots of productivity can help identify widely separated communities which could be joined more tightly together. In fact, if the productivity is multiplied by the distance and divided by the size of 'signers', we get a lower bound on how much the mean distance from the central key that would be reduced.

The converse of this is the 'central signer', the key which signs other keys most directly:

Mean	Max	KeyID		UserID
3.54442	9	76EA7391	Sascha 'Master' Lenz <master@wuerzburg.de>
3.55647	10	8E0A49D1	Wolfgang Ley, DFN-CERT <ley@cert.dfn.de>
3.71385	11	CE766B1F	Paul C. Leyland <pcl@foo.oucs.ox.ac.uk>
3.79041	11	666D0051	Assar Westerlund <assar@pdc.kth.se>
3.80411	11	FC0C02D5	Eugene H. Spafford <spaf@cs.purdue.edu>
3.81096	12	0DBF906D	Jeffrey I. Schiller <jis@mit.edu>
3.83247	12	DA0EDC81	Phil Karn <karn@unix.ka9q.ampr.org>
3.8457	11	7B7AE5E1	Germano Caronni <caronni@tik.ee.ethz.ch>
3.85444	11	4413B691	Thomas Lenggenhager <lenggenhager@switch.ch>
3.86271	10	32DD98D9	Vesselin V. Bontchev <bontchev@complex.is>
You can also get the full list by mean distance.

Here is the distance to each of the 'signees' from the central signer, along with info on productivity and how many keys sign and are signed by each key.

Further Questions

Many other aspects of the web of trust could be explored.

Tools

The analysis tool, pgpstat.java, is an application written in the very nice new language Java (Perl just doesn't have any decent hierarchical data structure support...). Source code will probably be made available after some cleaning-up for others who want to explore different keyrings or other avenues of analysis.

Performance note: the time needed to run most of the algorithms used here is proportional to the sum of the numbers of keys and signatures. In particular this is true for the code that finds the strongly-connected components (thanks to a favorite professor of mine, Bob Sedgewick, and his "Algorithms" book! (Addison-Wesley).

The one notable exception is finding the centers of trust and the longest shortest-path, which is quadratic in the size of the connected set, but doesn't have to be computed nearly as often, as a practical matter. The full analysis took less than 30 minutes using one processor on a Sun Sparc 1000 (50 Mhz?). There are lots of opportunities for optimization, and hopefully non-quadratic algorithms for at least approximating the center/longest path problems.

Please let me know if you have any feedback on this analysis. Thanks to Piete Brooks <pb@cl.cam.ac.uk> for very helpful feedback.


Neal McBurnett <nealmcb@bell-labs.com>