The PGP community has its own
Public Key Infrastructure.
It consists of a set of public key servers around
the world which allow PGP users to register their keys and publicly
sign
each other's keys via email and
WWW interfaces.
This analysis is based on the public keyring obtained from
ftp://ftp.uit.no/pub/crypto/pgp/keys/pubring.pgp
on 6 Dec 1996. That source is updated daily. A similar analysis
of the keyring as of
**Jan 1 1996** is also available.

Note that many keys and signatures have never been submitted to the public keyservers and are thus not included in this analysis.

It is also important to note that the word "trust" is used very loosly in the phrase "web of trust". Signing another key usually indicates nothing more than a level of confidence in the correlation between the key and the email address of someone who controls the private part of the key. It does not necessarily indicate a level of trust in the individual. Therefore we use, e.g., the term "central signee" rather than "central trustee" below to avoid implying too much.

Public keys submitted ('pub'): 34671 Signatures ('sig'): 54764 Total number of unique keys referenced: 37854 Revoked keys: 1442 Self-signed keyIDs: 16499 Other unique keyID-signs-keyID pairs: 31985Note that the program used for this analysis (pgpstat.java) so far only deals with keyID-keyID relationships, rather than dealing separately with each keyID/UserID pair. This means that if by chance two distinct keys with the same keyID were in the keyring, they would be treated as one, though there are no current examples of that. It does properly ignore revoked keys.

I tried pgp -kc on the keyring (with MIT PGP 2.6.2, for almost 2 days...) and the number of signatures dropped from 28031 to 25810, presumably because some old signatures were thrown out. But the analysis here was done with the original keyring and treats all of the signatures as valid, which means that all versions of PGP will not recognize all the signatures which are accepted here. Feedback on this issue is welcomed.

Note that less than half of the keys are signed by themselves (though
the percentage has increased from 38% to 49% in the last 11 months).
People should **always sign their own UserIDs** on their own
keys!
Hackers can surreptitiously add or change unsigned email addresses in
order to encourage correspondents to send email to the wrong place, so
never trust a UserID which isn't signed.

(*As of January:*) Only about 1/3 of the keys are signed by
at least one another key, and only 1/6 have 2 or more signatures.

To be a "good PGP-citizen", you should

- Be very careful about signing keys. Have first-hand knowledge based on hard-to-forge communications that the key's fingerprint (pgp -kvc) in your keyring matches the user's real fingerprint.
- Exchange signatures with at least two other people, at least one of whom is in the 'strong' set described below.
- Extra credit: Sponsor a key-signing party. But emphasize 'quality' of signatures (e.g. trying to form a bridge between different communities) rather than 'quantity'.

Size of 'strong': largest strongly-connected component: 2047 Size of 'signees': keys signed by 'strong': 4233 Size of 'signers': keys which sign 'strong': 3239Explanation: the largest strongly-connected component of this keyring (the set names 'strong') has 2047 keys in it. The next-largest strongly-connected component has only 16 keys in it. There are 2186 keys which are directly or indirectly signed by all the keys in 'strong' but which do not sign any key in 'strong' and are thus not in the strongly-connected component, for a total of 4233 keys in the 'signees' set that can be reached from the 'strong' set.

Similarly, there are a total of 3239 keys in the 'signers' set which directly or indirectly sign all the keys in the strong component.

Mean distance from strong keyIDs to signees: 5.98188 6.41189 Mean distance to strong keyIDs from signers: 6.25915 6.70961 Maximum shortest-path distance: 21First, for each key in 'strong', we compute the mean length of the shortest path necessary to reach each key in 'signees': 5.98188. Next we do the converse, following paths of signatures into the strong component rather than out of it. The mean distances are different because a different set of keys is involved: signers vs signees. These average distances are about 7% lower than on 1 Jan of 1996.

Finally, we note that there are several pairs of keys which have a shortest path distance of 21 between them. Here is the example path, between these two keys:

58F36FAD Frederick Fisher Jr. <fredfjr@cybrtyme.com> 104395F1 Arno Hueber <a.hueber@rman2.gun.de>Anyone who is along this path could improve the tightness of the web of trust by finding someone they know further along the path and carefully signing their key. The further away the better. Signing a key which preceeds yours on the path won't necessarily help, but getting them to sign yours will.

Mean Max KeyID UserID 3.96203 12 C1B06AF1 Derek Atkins <warlord@MIT.EDU> 4.02624 12 F82CEA91 Simon Cooper <sc@sgi.com> 4.10219 12 53AAF259 Klaus-Peter Kossakowski, DFN-CERT <kpk@cert.dfn.de> 4.11053 13 DA0EDC81 Phil Karn <karn@unix.ka9q.ampr.org> 4.11084 12 8E0A49D1 Wolfgang Ley, DFN-CERT <ley@cert.dfn.de> 4.13152 13 F1A37611 Theodore Y. Ts'o [ENCRYPTION] <tytso@mit.edu> 4.13461 12 D410B7F5 DFN-CERT <dfncert@cert.dfn.de> 4.13739 13 466B4289 Theodore Ts'o [SIGNATURE] <tytso@mit.edu> 4.13924 11 CE766B1F Paul C. Leyland <pcl@foo.oucs.ox.ac.uk> 4.17752 13 0DBF906D Jeffrey I. Schiller <jis@mit.edu>You can also get the full list by mean distance.

And here is the distance to the cental signee from each of the signers along with info on how many keys sign and are signed by each key. This file also has partial information on the 'productivity' of each signature. Finding signatures which are far from the center and have lots of productivity can help identify widely separated communities which could be joined more tightly together. In fact, if the productivity is multiplied by the distance and divided by the size of 'signers', we get a lower bound on how much the mean distance from the central key that would be reduced.

The converse of this is the 'central signer', the key which signs other keys most directly:

Mean Max KeyID UserID 3.54442 9 76EA7391 Sascha 'Master' Lenz <master@wuerzburg.de> 3.55647 10 8E0A49D1 Wolfgang Ley, DFN-CERT <ley@cert.dfn.de> 3.71385 11 CE766B1F Paul C. Leyland <pcl@foo.oucs.ox.ac.uk> 3.79041 11 666D0051 Assar Westerlund <assar@pdc.kth.se> 3.80411 11 FC0C02D5 Eugene H. Spafford <spaf@cs.purdue.edu> 3.81096 12 0DBF906D Jeffrey I. Schiller <jis@mit.edu> 3.83247 12 DA0EDC81 Phil Karn <karn@unix.ka9q.ampr.org> 3.8457 11 7B7AE5E1 Germano Caronni <caronni@tik.ee.ethz.ch> 3.85444 11 4413B691 Thomas Lenggenhager <lenggenhager@switch.ch> 3.86271 10 32DD98D9 Vesselin V. Bontchev <bontchev@complex.is>You can also get the full list by mean distance.

Here is the distance to each of the 'signees' from the central signer, along with info on productivity and how many keys sign and are signed by each key.

- It is important that the web be multiply-connected. Good software to do bi-connectivity (or tri-connectivity, etc.) for directed graphs would be useful, especially if it identifies the most significant articulation points (keys which are critical for the connection of big pieces of the web).
- Identification of large cliques or near-cliques (sets of keys which all sign each other, related to coloring problem, very hard.) We want to interconnect these cliques. We don't really want to encourage clique formation, since it adds logs of signatures which make the web harder to manage without adding much trust.
- Software to generate a high-level graphical view would be useful. For example, a directed-acyclic-graph of the connectivity of the larger strongly-connected components would be interesting.
- It shouldn't be too hard to make pgpstat.java into a server which could answer custom queries (shortest path from x to y, size of component that x is in, suggestions for who you might know who could sign your key (e.g. other keys from the strongly-connected component which are in your Internet domain), etc.)

Performance note: the time needed to run most of the algorithms used here is proportional to the sum of the numbers of keys and signatures. In particular this is true for the code that finds the strongly-connected components (thanks to a favorite professor of mine, Bob Sedgewick, and his "Algorithms" book! (Addison-Wesley).

The one notable exception is finding the centers of trust and the longest shortest-path, which is quadratic in the size of the connected set, but doesn't have to be computed nearly as often, as a practical matter. The full analysis took less than 30 minutes using one processor on a Sun Sparc 1000 (50 Mhz?). There are lots of opportunities for optimization, and hopefully non-quadratic algorithms for at least approximating the center/longest path problems.

Please let me know if you have any feedback on this analysis. Thanks to Piete Brooks <pb@cl.cam.ac.uk> for very helpful feedback.

Neal McBurnett <nealmcb@bell-labs.com>