PGP Web of Trust Statistics

I did the first analysis in January 1996 of the Web of Trust among users of Pretty Good Privacy (PGP), the market leader in the world of secure email communications. See The comp.security.pgp FAQ, The PKI page, the International PGP Home Page, and the GNU Privacy Guard for more information on PGP, GPG and other related tools.

For more modern analysis software, see analysis of the strong set in the PGP web of trust by Henk P. Penning. Other notable analyses over time have been done by keyanalyze - Analysis of a "full" PGP/GPG Keyring by Drew Streib, and in the keyanalyze reports by Jason Harris.

For another useful PGP-related service, which can tell you about (and show you graphically!) possible paths between specific keys, see Henk's PGP pathfinder & key statistics. It seems that AT&T PathServer PathServer and the Experimental PGP Pathfinder are not running now.

Neal McBurnett


1997 Analysis

The PGP community has its own Public Key Infrastructure. It consists of a set of public key servers around the world which allow PGP users to register their keys and publicly sign each other's keys via a variety of interfaces. This analysis is based on a public keyring obtained from Peter N. Wan on 4 Dec 1997 which is much larger than the one I used for previous studies: 29298735 bytes. The previous source I used, ftp://ftp.uit.no/pub/crypto/pgp/keys/pubring.pgp, gave me a keyring that was smaller than a year ago, unfortunately, for unknown reasons.

Note, The PGP web has grown far larger since this analysis thanks to better key distribution software in recent versions of PGP. As of 9 Dec 1998, at ftp://ftp.ch.pgp.net/pub/pgp/keys/ the keyring was 502 MB long, and there were 493923 keys: 92567 RSA, and 401356 ElGamal! I haven't tried to analyze them yet, and fear that I will need to modify the software and get more memory to handle it....

An older analysis of that keyring as of Dec 5 1996 or Jan 1 1996 is also available. Some of the increase in that year is clearly the result of having a better source of keys, rather than an increase in popularity.

There are reportedly a very large number of new DSS keys for use with PGP 5.0, but this analysis does not include any PGP 5.0 keys for two reasons. First, I hear there is as yet no way to get a full keyring out of the servers, apparantly due to memory constraints of the current implementation. Second, I haven't made the changes necessary to parse the new verbose listing format, nor decided how to deal with the issue of certification between DSS and RSA keys.

Note also that many keys and signatures have never been submitted to the public keyservers and are thus not included in this analysis.

It is also important to note that the word "trust" is used very loosly in the phrase "web of trust". Signing another key usually indicates nothing more than a level of confidence in the correlation between the key and the email address of someone who controls the private part of the key. It does not necessarily indicate a level of trust in the individual. Therefore we use, e.g., the term "central signee" rather than "central trustee" below to avoid implying too much.

Data for 4 Dec 1997

Overall Statistics

Public keys submitted ('pub'):		57582
Signatures ('sig'):			104059
Total number of unique keys referenced:	64587
Revoked keys:				2516

Self-signed keyIDs:			32912
Other unique keyID-signs-keyID pairs:	56376
Note that the program used for this analysis (pgpstat.java) so far only deals with keyID-keyID relationships, rather than dealing separately with each keyID/UserID pair. This means that if by chance two distinct keys with the same keyID were in the keyring, they would be treated as one, though there are no current examples of that. It does properly ignore revoked keys.

Two years ago I tried pgp -kc on the keyring (with MIT PGP 2.6.2, for almost 2 days...) and the number of signatures dropped from 28031 to 25810, presumably because some old signatures were thrown out. But the analysis here was done with the original keyring and treats all of the signatures as valid, which means that all versions of PGP will not recognize all the signatures which are accepted here. Feedback on this issue is welcomed.

Note that only 57% of the keys are signed by themselves (though the percentage has increased from 49% as of a year ago). People should always sign their own UserIDs on their own keys! Hackers can surreptitiously add or change unsigned email addresses in order to encourage correspondents to send email to the wrong place, so never trust a UserID which isn't signed.

As of January 1996, only about 1/3 of the keys were signed by at least one another key, and only 1/6 had 2 or more signatures.

To be a "good PGP-citizen", you should

Strongly-Connected Components

A 'strongly-connected' set of keys is defined as a set in which every key leads to every other key via some chain of signatures (aka signature path). Note that we are not incorporating any PGP-specific rules for establishing trust (e.g. the default CERT_DEPTH of 4, the default requirement for two 'marginally trusted' introducers to establish trust, etc.).
Size of 'strong': largest strongly-connected component:	3100 (was 2047)
Size of 'signees': keys signed by 'strong':		6662 (was 4233)
Size of 'signers': keys which sign 'strong':		5690 (was 3239)
Explanation: the largest strongly-connected component of this keyring (the set names 'strong') has 3100 keys in it. The next-largest strongly-connected component has only 29 keys in it. There are 3562 keys which are directly or indirectly signed by all the keys in 'strong' but which do not sign any key in 'strong' and are thus not in the strongly-connected component, for a total of 6662 keys in the 'signees' set that can be reached from the 'strong' set.

Similarly, there are a total of 5690 keys in the 'signers' set which directly or indirectly sign all the keys in the strong component, or are in 'strong' itself.

Shortest-Path Distances

Using a breadth-first search of the keyspace, we can calculate the shortest path from one keyID to other keyIDs it has directly or indirectly signed.
Mean distance from strong keyIDs to signees:	6.061
Mean distance to strong keyIDs from signers:	6.246

Maximum shortest-path distance:			21
First, for each key in 'strong', we compute the mean length of the shortest path necessary to reach each key in 'signees': 6.061 Next we do the converse, following paths of signatures into the strong component rather than out of it. The mean distances are different because a different set of keys is involved: signers vs signees. These average distances are very close to the values a year ago.

Finally, we note that there are several pairs of keys which have a shortest path distance of 21 between them. Here is the example path, between these two keys:

75464AF1 Spider  Boardman <spider@Empire.Net>
25233335 Hans-Joachim Martens <Hans-Joachim@LOGO-BOX.domino.de>
Anyone who is along this path could improve the tightness of the web of trust by finding someone they know further along the path and carefully signing their key. The further away the better. Signing a key which preceeds yours on the path won't necessarily help, but getting them to sign yours will.

Centers of "Trust": the Central Signee and Central Signer

By examining the shortest-path data more closely we can identify the keys which are closest to the 'center' of the web of trust. The 'central signee' is the key which is signed most directly by others; more specifically the key which has the shortest mean distance from all of the 'signers'. Here is the current "top 10", along with the mean and maximum distances to them from all keys in the 'signers' set:
Mean		KeyID		UserID

3.985413	C1B06AF1	Derek Atkins <warlord@MIT.EDU>
4.0240774	8E0A49D1	Wolfgang Ley, DFN-CERT <ley@cert.dfn.de>
4.121617	C7A966DD	Philip R. Zimmermann <prz@acm.org>
4.134622	F1A37611	Theodore Y. Ts'o [ENCRYPTION] <tytso@mit.edu>
4.151142	466B4289	Theodore Ts'o [SIGNATURE] <tytso@mit.edu>
4.154833	53AAF259	Klaus-Peter Kossakowski, DFN-CERT <kpk@cert.dfn.de>
4.171705	0DBF906D	Jeffrey I. Schiller <jis@mit.edu>
4.1940246	DA0EDC81	Phil Karn <karn@unix.ka9q.ampr.org>
4.196309	D410B7F5	DFN-CERT <dfncert@cert.dfn.de>
4.207557	F82CEA91	Simon Cooper <sc@sgi.com>
You can also get the full list by mean distance.

And here is the distance to the central signee from each of the signers along with info on how many keys sign and are signed by each key. This file also has partial information on the 'productivity' of each signature. Finding signatures which are far from the center and have lots of productivity can help identify widely separated communities which could be joined more tightly together. In fact, if the productivity is multiplied by the distance and divided by the size of 'signers', we get a lower bound on how much the mean distance from the central key would be reduced.

The converse of this is the 'central signer', the key which signs other keys most directly:

Mean		KeyID		UserID

3.5523868	8E0A49D1	Wolfgang Ley, DFN-CERT <ley@cert.dfn.de>
3.7434704	CE766B1F	Paul C. Leyland <pcl@foo.oucs.ox.ac.uk>
3.7979586	76EA7391	Sascha 'Master' Lenz <master@wuerzburg.de>
3.8441908	3F0D98DD	Stefan Kelm, DFN-PCA <kelm@pca.dfn.de>
3.8668568	0DBF906D	Jeffrey I. Schiller <jis@mit.edu>
3.8872712	0679ED91	teun.nijssen@kub.nl
3.9025817	666D0051	Assar Westerlund <assar@sics.se>
3.9067848	4413B691	Thomas Lenggenhager <lenggenhager@switch.ch>
3.9079857	F081195D	Matthias Bauer <matthiasb@acm.org>
3.9132392	FC0C02D5	Eugene H. Spafford <spaf@cs.purdue.edu>
You can also get the full list by mean distance.

Here is the distance to each of the 'signees' from the central signer, along with info on productivity and how many keys sign and are signed by each key.

Further Questions

Many other aspects of the web of trust could be explored.

Tools

The analysis tool, pgpstat.java, is an application written in Java (Perl just doesn't have any decent hierarchical data structure support...). Source code will probably be made available after some cleaning-up for others who want to explore different keyrings or other avenues of analysis.

Performance note: the time needed to run most of the algorithms used here is proportional to the sum of the numbers of keys and signatures. In particular this is true for the code that finds the strongly-connected components (thanks to a favorite professor of mine, Bob Sedgewick, and his "Algorithms" book! (Addison-Wesley).

The one notable exception is finding the centers of trust and the longest shortest-path, which is quadratic in the size of the connected set, but doesn't have to be computed nearly as often, as a practical matter. The full analysis took less than 30 minutes using one processor on a Sun Sparc 1000 (50 Mhz?). There are lots of opportunities for optimization, and hopefully non-quadratic algorithms for at least approximating the center/longest path problems.

Please let me know if you have any feedback on this analysis. Thanks to Peter N. Wan for providing access to the largest keyring I've seen yet, and to Piete Brooks for very helpful feedback.


Neal McBurnett Last modified: 2011-03-27