Notes for New Contributors

Source Code Layout

Under src there are directories

  • lib is the library itself, more on that below

  • cli is the command line application botan

  • tests contain what you would expect. Input files go under tests/data.

  • python/botan3.py is the Python ctypes wrapper

  • bogo_shim contains the shim binary and configuration for BoringSSL’s TLS test suite

  • fuzzer contains fuzz targets for various modules of the library

  • build-data contains files read by the configure script. For example build-data/cc/gcc.txt describes various gcc options.

  • examples contains usage examples used in the documentation.

  • scripts contains misc scripts: install, distribution, various codegen things. Scripts controlling CI go under scripts/ci.

  • configs contains configuration files tools like pylint

  • editors contains configuration files for editors like vscode and emacs

Under doc one finds the sources of this documentation

Library Layout

Under src/lib are several directories

  • asn1 is the DER encoder/decoder

  • base defines some high level types

  • block contains the block cipher implementations

  • codec has hex, base64, base32, base58

  • compat a (partial) compatibility layer for the libsodium API

  • compression has the compression wrappers (zlib, bzip2, lzma)

  • entropy has various entropy sources used by some of the RNGs

  • ffi is the C99 API

  • filters is a filter/pipe API for data transforms

  • hash contains the hash function implementations

  • kdf contains the key derivation functions

  • mac contains the message authentication codes

  • math is the big integer math library. It is divided into three parts: mp which are the low level algorithms; bigint which is a C++ wrapper around mp, and numbertheory which contains higher level algorithms like primality testing and exponentiation

  • misc contains odds and ends: format preserving encryption, SRP, threshold secret sharing, all or nothing transform, and others

  • modes contains block cipher modes (CBC, GCM, etc)

  • passhash contains password hashing algorithms for authentication

  • pbkdf contains password hashing algorithms for key derivation

  • pk_pad contains padding schemes for public key algorithms

  • prov contains bindings to external libraries such as PKCS #11

  • psk_db contains a generic interface for a Pre-Shared-Key database

  • pubkey contains the public key algorithms

  • rng contains the random number generators

  • stream contains the stream ciphers

  • tls contains the TLS implementation

  • utils contains various utility functions and types

  • x509 is X.509 certificates, PKCS #10 requests, OCSP

Each of these folders can contain subfolders which are treated as modules if they contain an info.txt file. These submodules have an implicit dependency on their parent module. The chapter Understanding configure.py contains more information on Botan’s module architecture.

Sending patches

All contributions should be submitted as pull requests via GitHub (https://github.com/randombit/botan). If you are planning a large change, open a discussion ticket on github before starting out to make sure you are on the right path. And once you have something written, even if it is not complete/ready to go, feel free to open a draft PR for early review and comment.

If possible please sign your git commits using a PGP key. See https://git-scm.com/book/en/v2/Git-Tools-Signing-Your-Work for instructions on how to set this up.

Depending on what your change is, your PR should probably also include an update to news.rst with a note explaining the change. If your change is a simple bug fix, a one sentence description is perhaps sufficient. If there is an existing ticket on GitHub with discussion or other information, reference it in your change note as ‘GH #000’.

Update doc/credits.txt with your information so people know what you did!

If you are interested in contributing but don’t know where to start check out doc/dev_ref/todo.rst for some ideas - these are changes we would almost certainly accept once they’ve passed code review.

Also, try building and testing it on whatever hardware you have handy, especially unusual platforms, or using C++ compilers other than the regularly tested GCC, Clang, and Visual Studio.

FFI Additions

If adding a new function declaration to ffi.h, the same PR must also add the same declaration in the Python binding botan3.py, in addition the new API functionality must be exposed to Python and a test written in Python.

Git Usage

Do NOT merge master into your topic branch, this creates needless commits and noise in history. Instead, as needed, rebase your branch against master (git rebase -i master) and force push the branch to update the PR. If the GitHub PR page does not report any merge conflicts and nobody asks you to rebase, you don’t need to rebase.

Try to keep your history clean and use rebase to squash your commits as needed. If your diff is less than roughly 100 lines, it should probably be a single commit. Only split commits as needed to help with review/understanding of the change.

Python

Scripts should be in Python 3 whenever possible.

For configure.py (and helper scripts install.py, cleanup.py and build_docs.py) the target is stock (no modules outside the standard library) CPython 3.x. Support for PyPy, etc is great when viable (in the sense of not causing problems for 3.x, and not requiring huge blocks of version dependent code). As running this program successfully is required for a working build, making it as portable as possible is considered key.

The python wrapper botan3.py targets CPython 3.x, and latest PyPy. Note that a single file is used to avoid dealing with any of Python’s various crazy module distribution issues.

For random scripts not typically run by an end-user (codegen, visualization, and so on) there isn’t any need to worry about platform independence. Here it’s fine to depend on any useful modules such as graphviz or matplotlib, regardless if it is available from a stock CPython install.

Build Tools and Hints

If you don’t already use it for all your C/C++ development, install ccache (or on Windows, sccache) right now, and configure a large cache on a fast disk. It allows for very quick rebuilds by caching the compiler output.

Use --enable-sanitizers= flag to enable various sanitizer checks. Supported values including “address” and “undefined” for GCC and Clang. GCC also supports “iterator” (checked iterators), and Clang supports “memory” (MSan) and “coverage” (for fuzzing).

On Linux if you have the lcov and gcov tools installed, then running ./src/scripts/ci_build.py coverage will produce a coverage enabled build, run the tests, test the fuzzers against a corpus, and produce an HTML report of total coverage. This coverage build requires the development headers for zlib, bzip2, liblzma, TrouSerS (libtspi), and Sqlite3.

Style Conventions

When writing your code remember the need for it to be easily understood by reviewers and auditors, both at the time of the patch submission and in the future.

Avoid complicated template metaprogramming where possible. It has its places but should be used judiciously.

When designing a new API (for use either by library users or just internally) try writing out the calling code first. That is, write out some code calling your idealized API, then just implement that API. This can often help avoid cut-and-paste by creating the correct abstractions needed to solve the problem at hand.

The C++11 auto keyword is very convenient but only use it when the type truly is obvious (considering also the potential for unexpected integer conversions and the like, such as an apparent uint8_t being promoted to an int).

Unless there is a specific reason otherwise (eg due to calling some C API which requires exactly a long* be provided) integer types should be either (u)intXX_t or size_t. If the variable is used for integer values of “no particular size”, as in the loop for(some_type i = 0; i != 100; ++i) then the type should be size_t. Use one of the specific size integer types only when there is a algorithmic/protocol reason to use an integer of that size. For example if a parsing a protocol that uses 16-bit integer fields to encode a length, naturally one would use uint16_t there.

If a variable is defined and not modified, declare it const. Some exception for very short-lived variables, but generally speaking being able to read the declaration and know it will not be modified is useful.

Use override annotations whenever overriding a virtual function. If introducing a new type that is not intended for further derivation, mark it final.

Avoid explicit new or (especially) explicit delete: use RAII, make_unique, etc.

Use m_ prefix on all member variables.

clang-format is used for all C++ formatting. The configuration is in .clang-format in the root directory. You can rerun the formatter using make fmt or by invoking the script src/scripts/dev_tools/run_clang_format.py. If the output would be truly horrible, it is allowed to disable formatting for a specific area using // clang-format off annotations.

Note

Since the output of clang-format varies from version to version, we currently require using exactly clang-format 15.

Use braces on both sides of if/else blocks, even if only using a single statement.

Avoid using namespace declarations, even inside of single functions. One allowed exception is using namespace std::placeholders in functions which use std::bind. (But, don’t use std::bind - use a lambda instead).

Use :: to explicitly refer to the global namespace (eg, when calling an OS or external library function like ::select or ::sqlite3_open).

Use of External Dependencies

Compiler Dependencies

The library should always be as functional as possible when compiled with just Standard C++20. However, feel free to use the full language.

Use of compiler extensions is fine whenever appropriate; this is typically restricted to a single file or an internal header. Compiler extensions used currently include native uint128_t, SIMD intrinsics, inline asm syntax and so on, so there are some existing examples of appropriate use.

Generally intrinsics or inline asm is preferred over bare assembly to avoid calling convention issues among different platforms; the improvement in maintainability is seen as worth any potential performance tradeoff. One risk with intrinsics is that the compiler might rewrite your clever const-time SIMD into something with a conditional jump, but code intended to be const-time should in any case be annotated (using CT::poison) so it can be checked at runtime with tools.

Operating System Dependencies

If you’re adding a small OS dependency in some larger piece of code, try to contain the actual non-portable operations to utils/os_utils.* and then call them from there.

As a policy, operating systems which are not supported by their original vendor are not supported by Botan either. Patches that complicate the code in order to support obsolete operating systems will likely be rejected. In writing OS specific code, feel free to assume roughly POSIX 2008, or for Windows, Windows 8 /Server 2012 (which are as of this writing the oldest versions still supported by Microsoft).

Some operating systems, such as OpenBSD, only support the latest release. For such cases, it’s acceptable to add code that requires APIs added in the most recent release of that OS as soon as the release is available.

Library Dependencies

Any external library dependency - even optional ones - is met with as one PR submitter put it “great skepticism”.

At every API boundary there is potential for confusion that does not exist when the call stack is all contained within the boundary. So the additional API really needs to pull its weight. For example a simple text parser or such which can be trivially implemented is not really for consideration. As a rough idea of the bar, equate the viewed cost of an external dependency as at least 1000 additional lines of code in the library. That is, if the library really does need this functionality, and it can be done in the library for less than that, then it makes sense to just write the code. Yup.

Currently the (optional) external dependencies of the library are several compression libraries (zlib, bzip2, lzma), sqlite3 database, Trousers (TPM integration), plus various operating system utilities like basic filesystem operations. These provide major pieces of functionality which seem worth the trouble of maintaining an integration with.

At this point the most plausible examples of an appropriate new external dependency are all deeper integrations with system level cryptographic interfaces (CommonCrypto, CryptoAPI, /dev/crypto, iOS keychain, TPM 2.0, etc)