For more modern analysis software, see analysis of the strong set in the PGP web of trust by Henk P. Penning. Other notable analyses over time have been done by keyanalyze - Analysis of a "full" PGP/GPG Keyring by Drew Streib, and in the keyanalyze reports by Jason Harris.

For another useful PGP-related service, which can tell you about (and show you graphically!) possible paths between specific keys, see Henk's PGP pathfinder & key statistics. It seems that AT&T PathServer PathServer and the Experimental PGP Pathfinder are not running now.

The PGP community has its own Public Key Infrastructure. It consists of a set of public key servers around the world which allow PGP users to register their keys and publicly sign each other's keys via a variety of interfaces. This analysis is based on a public keyring obtained from Peter N. Wan on 4 Dec 1997 which is much larger than the one I used for previous studies: 29298735 bytes. The previous source I used, ftp://ftp.uit.no/pub/crypto/pgp/keys/pubring.pgp, gave me a keyring that was smaller than a year ago, unfortunately, for unknown reasons.

**Note**,
*The PGP web has grown far larger since this analysis thanks to better
key distribution software in recent versions of PGP. As of 9 Dec 1998,
at
ftp://ftp.ch.pgp.net/pub/pgp/keys/ the keyring was
502 MB long, and
there were 493923 keys: 92567 RSA, and 401356 ElGamal! I haven't
tried to analyze them yet, and fear that I will need to
modify the software and get more memory to handle it....*

An older analysis of that keyring as of
**Dec 5 1996** or
**Jan 1 1996** is also available.
Some of the increase in that year is clearly the result of
having a better source of keys, rather than an increase in
popularity.

There are reportedly a very large number of new DSS keys for use with PGP 5.0, but this analysis does not include any PGP 5.0 keys for two reasons. First, I hear there is as yet no way to get a full keyring out of the servers, apparantly due to memory constraints of the current implementation. Second, I haven't made the changes necessary to parse the new verbose listing format, nor decided how to deal with the issue of certification between DSS and RSA keys.

Note also that many keys and signatures have never been submitted to the public keyservers and are thus not included in this analysis.

It is also important to note that the word "trust" is used very loosly in the phrase "web of trust". Signing another key usually indicates nothing more than a level of confidence in the correlation between the key and the email address of someone who controls the private part of the key. It does not necessarily indicate a level of trust in the individual. Therefore we use, e.g., the term "central signee" rather than "central trustee" below to avoid implying too much.

Public keys submitted ('pub'): 57582 Signatures ('sig'): 104059 Total number of unique keys referenced: 64587 Revoked keys: 2516 Self-signed keyIDs: 32912 Other unique keyID-signs-keyID pairs: 56376Note that the program used for this analysis (pgpstat.java) so far only deals with keyID-keyID relationships, rather than dealing separately with each keyID/UserID pair. This means that if by chance two distinct keys with the same keyID were in the keyring, they would be treated as one, though there are no current examples of that. It does properly ignore revoked keys.

Two years ago I tried pgp -kc on the keyring (with MIT PGP 2.6.2, for almost 2 days...) and the number of signatures dropped from 28031 to 25810, presumably because some old signatures were thrown out. But the analysis here was done with the original keyring and treats all of the signatures as valid, which means that all versions of PGP will not recognize all the signatures which are accepted here. Feedback on this issue is welcomed.

Note that only 57% of the keys are signed by themselves (though
the percentage has increased from 49% as of a year ago).
People should **always sign their own UserIDs** on their own
keys!
Hackers can surreptitiously add or change unsigned email addresses in
order to encourage correspondents to send email to the wrong place, so
never trust a UserID which isn't signed.

As of January 1996, only about 1/3 of the keys were signed by at least one another key, and only 1/6 had 2 or more signatures.

To be a "good PGP-citizen", you should

- Be very careful about signing keys. Have first-hand knowledge based on hard-to-forge communications that the key's fingerprint (pgp -kvc) in your keyring matches the user's real fingerprint.
- Exchange signatures with at least two other people, at least one of whom is in the 'strong' set described below.
- Extra credit: Sponsor a key-signing party. But emphasize 'quality' of signatures (e.g. trying to form a bridge between different communities) rather than 'quantity'.

Size of 'strong': largest strongly-connected component: 3100 (was 2047) Size of 'signees': keys signed by 'strong': 6662 (was 4233) Size of 'signers': keys which sign 'strong': 5690 (was 3239)Explanation: the largest strongly-connected component of this keyring (the set names 'strong') has 3100 keys in it. The next-largest strongly-connected component has only 29 keys in it. There are 3562 keys which are directly or indirectly signed by all the keys in 'strong' but which do not sign any key in 'strong' and are thus not in the strongly-connected component, for a total of 6662 keys in the 'signees' set that can be reached from the 'strong' set.

Similarly, there are a total of 5690 keys in the 'signers' set which directly or indirectly sign all the keys in the strong component, or are in 'strong' itself.

Mean distance from strong keyIDs to signees: 6.061 Mean distance to strong keyIDs from signers: 6.246 Maximum shortest-path distance: 21First, for each key in 'strong', we compute the mean length of the shortest path necessary to reach each key in 'signees': 6.061 Next we do the converse, following paths of signatures into the strong component rather than out of it. The mean distances are different because a different set of keys is involved: signers vs signees. These average distances are very close to the values a year ago.

Finally, we note that there are several pairs of keys which have a shortest path distance of 21 between them. Here is the example path, between these two keys:

75464AF1 Spider Boardman <spider@Empire.Net> 25233335 Hans-Joachim Martens <Hans-Joachim@LOGO-BOX.domino.de>Anyone who is along this path could improve the tightness of the web of trust by finding someone they know further along the path and carefully signing their key. The further away the better. Signing a key which preceeds yours on the path won't necessarily help, but getting them to sign yours will.

Mean KeyID UserID 3.985413 C1B06AF1 Derek Atkins <warlord@MIT.EDU> 4.0240774 8E0A49D1 Wolfgang Ley, DFN-CERT <ley@cert.dfn.de> 4.121617 C7A966DD Philip R. Zimmermann <prz@acm.org> 4.134622 F1A37611 Theodore Y. Ts'o [ENCRYPTION] <tytso@mit.edu> 4.151142 466B4289 Theodore Ts'o [SIGNATURE] <tytso@mit.edu> 4.154833 53AAF259 Klaus-Peter Kossakowski, DFN-CERT <kpk@cert.dfn.de> 4.171705 0DBF906D Jeffrey I. Schiller <jis@mit.edu> 4.1940246 DA0EDC81 Phil Karn <karn@unix.ka9q.ampr.org> 4.196309 D410B7F5 DFN-CERT <dfncert@cert.dfn.de> 4.207557 F82CEA91 Simon Cooper <sc@sgi.com>You can also get the full list by mean distance.

And here is the distance to the central signee from each of the signers along with info on how many keys sign and are signed by each key. This file also has partial information on the 'productivity' of each signature. Finding signatures which are far from the center and have lots of productivity can help identify widely separated communities which could be joined more tightly together. In fact, if the productivity is multiplied by the distance and divided by the size of 'signers', we get a lower bound on how much the mean distance from the central key would be reduced.

The converse of this is the 'central signer', the key which signs other keys most directly:

Mean KeyID UserID 3.5523868 8E0A49D1 Wolfgang Ley, DFN-CERT <ley@cert.dfn.de> 3.7434704 CE766B1F Paul C. Leyland <pcl@foo.oucs.ox.ac.uk> 3.7979586 76EA7391 Sascha 'Master' Lenz <master@wuerzburg.de> 3.8441908 3F0D98DD Stefan Kelm, DFN-PCA <kelm@pca.dfn.de> 3.8668568 0DBF906D Jeffrey I. Schiller <jis@mit.edu> 3.8872712 0679ED91 teun.nijssen@kub.nl 3.9025817 666D0051 Assar Westerlund <assar@sics.se> 3.9067848 4413B691 Thomas Lenggenhager <lenggenhager@switch.ch> 3.9079857 F081195D Matthias Bauer <matthiasb@acm.org> 3.9132392 FC0C02D5 Eugene H. Spafford <spaf@cs.purdue.edu>You can also get the full list by mean distance.

Here is the distance to each of the 'signees' from the central signer, along with info on productivity and how many keys sign and are signed by each key.

- It is important that the web be multiply-connected. Good software to do bi-connectivity (or tri-connectivity, etc.) for directed graphs would be useful, especially if it identifies the most significant articulation points (keys which are critical for the connection of big pieces of the web).
- Identification of large cliques or near-cliques (sets of keys which all sign each other, related to coloring problem, very hard.) We want to interconnect these cliques. We don't really want to encourage clique formation, since it adds lots of signatures which make the web harder to manage without adding much trust.
- Software to generate a high-level graphical view would be useful. For example, a directed-acyclic-graph of the connectivity of the larger strongly-connected components would be interesting.
- It shouldn't be too hard to make pgpstat.java into a server which could answer custom queries (shortest path from x to y, size of component that x is in, suggestions for who you might know who could sign your key (e.g. other keys from the strongly-connected component which are in your Internet domain), etc.)

Performance note: the time needed to run most of the algorithms used here is proportional to the sum of the numbers of keys and signatures. In particular this is true for the code that finds the strongly-connected components (thanks to a favorite professor of mine, Bob Sedgewick, and his "Algorithms" book! (Addison-Wesley).

The one notable exception is finding the centers of trust and the longest shortest-path, which is quadratic in the size of the connected set, but doesn't have to be computed nearly as often, as a practical matter. The full analysis took less than 30 minutes using one processor on a Sun Sparc 1000 (50 Mhz?). There are lots of opportunities for optimization, and hopefully non-quadratic algorithms for at least approximating the center/longest path problems.

Please let me know if you have any feedback on this analysis. Thanks to Peter N. Wan for providing access to the largest keyring I've seen yet, and to Piete Brooks for very helpful feedback.

Neal McBurnett Last modified: 2011-03-27