Botan News

Algorithm Benchmarking and Provider Selection in Botan 1.8

A major driver for Botan's performance in the last couple of years has been from its use in Monotone, a distributed revision control system. There were two major design decisions made by Monotone's developers which caused Botan to become a bottleneck in Monotone's performance. This post is about those design decisions, and changes made in the last two years during the 1.7 development process intended to improve Monotone's performance.

Monotone uses a form of content based addressing - many objects within Monotone are named by hashing the contents with SHA-1. Naming things by their cryptographic hash is one way to get to the "secure and decentralized" corner of Zooko's Triangle, which is exactly where one wants to be for this problem.

For instance the full name of the revision corresponding to the most recent release of Botan (1.7.22) is "3ebbcf939bfb693a1b97a3d32082807f797cbe82". In many cases Monotone will allow the use of unambiguous prefixes of a revision name, though it's important to remember that what is ambiguous or not can change as new (more or less randomly named) revisions are added to the database. Right now in my particular database, "3ebb" is long enough to be an unambiguous prefix for the 1.7.22 revision, but "3eb" is not (it conflicts with another revision, which was created this spring). The problem is that the output of the SHA-1 hash is pseudo-random and cannot be easily controlled, so for all I know, the next revision added to the database will have a leading 24 bits of 0x3ebbcf, meaning I would in the future have to use 3ebbcf9, or some longer prefix, to unambiguously refer to the revision 3ebbcf939bfb693a1b97a3d32082807f797cbe82. Though as an alternative I can always use the selector syntax "t:1.7.22", since that is how that particular revision was tagged.

A Monotone revision binds some metadata (timestamp, branch name, author name, and some cryptographic signatures) with the identifiers of each file included in that revision. Each file is in turn named by the SHA-1 hash of its contents. Monotone stores all of its data in a SQLite database file, so it is easy to verify exactly how Monotone stores revisions and files:

(motoko ~)$ sqlite3 ~/var/mtn/botan.mtn
SQLite version 3.5.9
Enter ".help" for instructions
sqlite> .schema files
CREATE TABLE files
      (
      id primary key,   -- strong hash of file contents
      data not null     -- compressed contents of a file
      );
sqlite> .schema revisions
CREATE TABLE revisions
      (
      id primary key,      -- SHA1(text of revision)
      data not null        -- compressed, encoded contents of a revision
      );

The correspondence between file name and SHA-1 hash is also easy to demonstrate using sha1sum from GNU coreutils and Monotone's automate interface.

(motoko ~)$ sha1sum ~/projects/botan/mainline/readme.txt
d233d919f2b5358520b725f427c4f35b5d866fbf  projects/botan/mainline/readme.txt

(motoko ~)$ mtn -d ~/var/mtn/botan.mtn automate get_file d233d919f2b5358520b725f427c4f35b5d866fbf
Botan 1.7.23-pre ????-??-??
http://botan.randombit.net/

This release of Botan is the second release candidate for the 1.8.0
release. It should be stable, though some APIs have changed
incompatibly since the 1.6 release.

You can file bugs at http://www.randombit.net/bugzilla or by sending
mail to the botan-devel mailing list.

For more information, see info.txt, the build instructions, the API
reference manual, and the tutorial, all of which can be found in the
doc/ directory.

Another important design point in Monotone is that each node maintains a complete (or as complete as limitations to local knowledge allow) copy of the project history in the local SQLite database. For instance, if you were to check out Monotone's sources, the first thing that would happen is a complete copy of all revisions would be copied to your local database. Naturally, your Monotone client does not trust the machine it is receiving the revisions from, so upon receipt of a new revision it checks the hashes and RSA signatures associated with each one. This means you can pull revisions from a completely untrusted, or even actively malicious, source, and be assured that Monotone will reject any corrupted revisions or contents as invalid (as long as the developer's RSA keys were not compromised, at least).

The combination of pervasive hashing and the use of total history means that Monotone will perform a very large number of SHA-1 computations, especially on initial checkout, when each revision of each file will be hashed on the local machine to verify it against the content hashes included in the revision data. Optimizing this particular operation is pretty important, because it is one of the very first things a developer new to Monotone will probably do, and first impressions can be pretty important. In fact slowness on initial pull, especially for large projects, is one of the most common complaints I've seen about Monotone.

How much does all this content hashing cost? For profiling questions of this sort, my preferred tools are to use valgrind for data collection and visualize it with kcachegrind. I profiled Monotone pulling about 1000 revisions over the loopback interface, and found that the (already pretty well optimized) SHA-1 compression function was using over 15% of the total CPU time, which roughly coorelates with previous results reported on the Monotone mailing list.

Botan includes a feature allowing applications to get an algorithm implementation using a string representation. Botan adopted the symbology created for SCAN, which is a standardized form for the string representations of cryptographic algorithms. It was inspired by and mostly used by JCE providers like BouncyCastle, since Sun's JCE uses getInstance for acquiring new objects from dynamically chosen providers using a string representation. And Botan, like JCE, supports the notion of a provider which allows multiple implementations of an algorithm to be available at once. However until recently applications using Botan had only limited and very coarse-grained over which implementation would be chosen.

The baseline implementation of SHA-1 in Botan is written in C++, though a few utility functions like byte swapping are implemented using inline assembly on x86 and x86-64 processors. There are also implementations written in x86 and x86-64 assembly, one that hooks OpenSSL's implementation, and one that uses the SSE2 implementation written by Dean Gaudet.

Botan 1.8 allows multiple providers of an algorithm to be enabled at once. The exact providers available for a particular algorithm can be found by calling Algorithm_Factory::providers_of, which accepts a string specifying the algorithm and returns a (possibly empty) vector of strings naming the providers. An application can then choose which implementation to use and enable it as the default one using Algorithm_Factory::set_preferred_provider.

It is perhaps not immediately clear how to choose the best provider! But most of the time, the criteria in question is going to be performance, which is the reason most of the current alternative providers of algorithms in Botan are included. However in the future it may be useful to add a provider which adds implementations of algorithms like AES implemented in a manner resistant to timing attacks; constant-time implementations of cryptographic algorithms tend to be slower than normal, but resistance to such attacks can be important in some contexts.

For choosing the fastest algorithm, Botan now provides a convenience function which benchmarks all providers of an algorithm for a fixed amount of time, and returns the speed of each provider's implementation in MiB per second:

std::map<std::string, double>
algorithm_benchmark(const std::string& name,
                    u32bit milliseconds,
                    Timer& timer,
                    RandomNumberGenerator& rng,
                    Algorithm_Factory& af);

With this function, an application can easily find the fastest provider at runtime and then enable that provider as the default for future use, as seen in this snippet of the example hash_quickly.cpp:

   Botan::AutoSeeded_RNG rng;
   Botan::Default_Benchmark_Timer timer;

   /*
    global_state() maintains an Algorithm_Factory for backwards compat, but
    there is nothing stopping you from creating your own to use instead.
   */
   Botan::Algorithm_Factory& af = Botan::global_state().algorithm_factory();

   double milliseconds = 300;

   /*
   * algorithm_benchmark() will try to run for arg milliseconds total
   * time, regardless of the # of providers
   */
   std::map<std::string, double> results =
      Botan::algorithm_benchmark(algo, milliseconds, timer, rng, af);

   std::string fastest_provider = "";
   double best_res = 0;

   for(std::map<std::string, double>::iterator r = results.begin();
       r != results.end(); ++r)
      {
      std::cout << r->first << " @ " << r->second << " MiB/sec\n";

      if(fastest_provider == "" || r->second > best_res)
         {
         fastest_provider = r->first;
         best_res = r->second;
         }
      }

   std::cout << "Using " << fastest_provider << "\n";

   /* After this call returns, future calls to af will return this
      provider's implementation
   */
   af.set_preferred_provider(algo, fastest_provider);

On my 2.4 GHz Core2 Q6600, using the POSIX clock_gettime clock for execution timing, algorithm_benchmark returned these results for SHA-1:

The 60% increase between the C++ implementation and the x86-64 assembly version of SHA-1 included in OpenSSL makes quite a difference in Monotone's overall performance! The very tiny uptick seen going from C++ to x86-64 assembly suggests that the assembler code could do better (or that the assembly is fine, but the C++ is nearly completely optimal for GCC on x86-64, which seems pretty unlikely, especially considering the OpenSSL time). Using the much faster OpenSSL or SSE2 variants reduces Monotone's total time spent on SHA-1 compressions during a large pull to only about 10% of the total runtime, a substantial reduction all in all.

Cryptographic design and implementation remains highly performance driven, with performance comparisons and new speed records being considered publishable results in some journals. During the AES competition, performance on many platforms as a major criteria in choosing both the finalists, and ultimately in choosing the winner Rijndael over alternatives like Serpent. I was lucky enough to be in attendance at the first NIST Hash Workshop, where the idea of a public competition to choose the successor to SHA-2 was first publicly discussed. One of the major reasons mentioned at the time for adopting a new hash function was precisely because SHA-2's performance is so terrible on many modern processors compared to previous designs like SHA-1 and RIPEMD-160, and it is likely that the hash finally chosen as SHA-3 will be highly optimized for modern superscalar multicore CPUs. Some, like Skein and MD6, use structures that allow highly parallel operations, while others, like Boole instead focus on the best possible single-core performance, on the assumption that in a multicore context is is likely that most applications will have more than one independent stream of data to process anyway. In a multicore router, for instance, it does not matter that MD6 can accelerate a single computation across cores, because each packet will be smaller than the 1500 byte MTU, but of course there will usually be many packets to process, each of which can be looked at completely in parallel with little or no synchronization required. Similarly, Monotone could easily hash many different files in parallel, using a thread pool or other abstraction to farm out the work, even if the hash function used was not itself very parallelizable, and probably would be able to saturate the processing resources of even the largest multicore systems.

Posted 2008/11/22 in devnotes; no comments

< Botan In Feature Freeze for 1.8.0 | Botan 1.7.23 aka 1.8.0 RC2 Released >

Name:


E-mail:


URL:


Comment: