What Developers Need to Know About Cryptography

Published on: 2018-06-29

The specifics of cryptography seem to confuse a lot of software developers. If you are to fully understand how the security framework I've built functions, though, you will need a fairly good understanding of cryptography, at least at a high level. It might be best to start with symmetric encryption, which is what many people think of when they think of "cryptography".

Symmetric Encryption

Symmetric encryption (sometimes called private key encryption for reasons that will become clearer later) refers to the process of turning information into gobbledygook (ciphertext), which is called encryption, then turning the ciphertext back into readable information, called decryption. There are two components needed for most, if not all, symmetric encryption algorithms:

Key

The key is a long (128+ in modern algorithms) set of bits that is the secret which allows the algorithm to encrypt and decrypt your information, much like a key to a door can be used to both lock and unlock a deadbolt. Symmetric encryption is called "symmetric" because a single key is used to both encrypt and decrypt information. Like the key to your door, you should only give out the key to people you want to get access to your information.

Initialization Vector (IV)

It would be extremely impractical to create a new key for each data point you would like to secure, but it would be insecure to use the same key everywhere you need to encrypt something. The IV was created to provide some randomness to the encrypted value, making it safer to reuse keys. Despite some articles you may read online, the IV really should be unique to each individual encrypted value, but it can be stored with the data itself, so no need for the additional security that storing a key requires.

To encrypt and decrypt data using symmetric encryption, you need to know both the key and IV, though the IV isn't a secret and is mainly there to make guessing the key harder.

Hashing

Like symmetric encryption, a hashing algorithm is used to turn information into ciphertext. Unlike symmetric encryption, though, ciphertext that was created by a hashing algorithm cannot be turned back into readable data. But hashing (at least the kind we'll cover in this post) does not have a key or an IV, so hashing "abc" 5 different times results in 5 identical ciphertexts. This may seem useless at first, but there are three ways in which hashes are useful:

If I need to search for a matching value, hashing is more efficient than encryption. If I try to search for an encrypted value, I need to decrypt all rows first in order to find any matches. If I hash the data instead, I only need to hash the text I'm searching for and search for the hash.
If I have data that I need to confirm matches, but I don't need to know what that data is, then matching against a hash is an option. One example of this is passwords—the system (generally) doesn't need to know what the original password was, it just needs to confirm whether the user-supplied password matches the old one.
If I want to make sure my data doesn't get changed, I can hash the data then store the hash. To check if any changes were made, I can make a new hash and compare it with the old. If the new and the old hashes are the same, I can be almost entirely certain that the data is the same.

One note: because hashing text always results in the same ciphertext, a hacker could create a lookup list of all possible values then by brute force be able to figure out what the original data is. (Or find a website that has done the work for you.) One way to get around this problem is to add random data to the value hashed called a salt. For example, instead of hashing "abc" above, you might use a salt of "101010" and hash "101010abc". As long as you always use "101010" as your salt for that data point, your hash ciphertext will always match.

Asymmetric Encryption

Asymmetric encryption is like symmetric encryption because data can be both encrypted and decrypted, but unlike symmetric encryption, asymmetric encryption uses two keys, not one. These keys always come in pairs, and one is called the public key and the other the private key. (Since asymmetric is the only encryption type that uses a public key, it is often referred to as public key encryption. If you recall, symmetric encryption is sometimes referred to as private key encryption.)

What's odd about asymmetric encryption is that the only thing that can decrypt something encrypted with a public key is the corresponding private key, and the only thing that can decrypt something encrypted with the private key is the corresponding public key. These two approaches have very different purposes:

Encrypting text with a private key does nothing to help keep the content of a message private; after all, it can be decrypted with a public key, and the public key really can be public. Rather, this approach is used to verify that the sender of the encrypted information really is who they say they are—the holder of the private key.
Encrypting text with a public key does help keep the information private, because it can only be decrypted with the associated private key. If you want to make sure a message can only be read by the holder of a private key, encrypt the message with the corresponding public key.

In reality, though, there are practical limitations to asymmetric encryption that cause its use to be limited, from execution speed to key management. Here are two common uses of asymmetric encryption:

If I want to ensure data hasn't been tampered with, I'd use a hash. But how do I ensure that that hash hasn't also been tampered with? One common approach is to hash the data, then encrypt the hash with a private This is one way to create a digital signature. Therefore anyone with the public key can decrypt the signature (which shows the hashed version of the original message) and compare it to the hashed version of the current message. If the two hashes match, then we can be almost certain that the message has not been altered.
If we want to have a private conversation, we'll probably choose to encrypt our communications using a symmetric algorithm. But how do we agree on a key? As an example, let's assume that Alice and Bob want to have a conversation. Alice starts the conversation by sending Bob her public key. Bob sends Alice the symmetric algorithm he wants to use along with a key for both to use (since symmetric algorithms use a single key to encrypt and decrypt), but he encrypts this information with Alice's public key. This ensures that Alice, the holder of the private key, will be the only one able to decrypt Bob's message. Once Alice decrypts Bob's message containing his symmetric key, Alice can safely send messages to Bob using that key.

Summary

Here are some general guidelines as to when one approach is appropriate over another:

If you need to hide data, and you need to get at that data later, encrypt it.
If you need to hide data, but you never need to know what it originally was (such as a password), hash it.
If you need to ensure that data hasn't been altered, use a hash or a digital signature.
If you need to talk to someone else securely, send an encryption key encrypted with an asymmetric algorithm's public key, then communicate via a method encrypted with a symmetric algorithm.

I hope that helps! Stay tuned for a description of how this was implemented in our security framework.