Another AD example: Ben Toews, in our Vault replacement secret storage system Pet Semetary, uses the AD on SQLite ciphertexts to bind them to a particular row (and/or a particular key path).
I wrote a local file encryption tool, around the same time Filippo was doing `age`, and used the AD on Chapoly to authenticate the chunk offset into the file. (The only thing interesting my tool did was that it could pull keys from AWS KMS).
So one use for AD is to authenticate headers; another is contextual binding.
If it helps (because 'stavros asked across the thread why bother having AD at all rather than just including it in the ciphertext), authenticated data can include data that doesn't even appear in the message, but rather is derived from the context at the time the message is encrypted and decrypted. A message only meant to be decrypted on a particular host (or whatever), for instance, could include the host in its AD, but never record that in the actual bits of the message.
some_furry 1 hours ago [-]
This is mostly unrelated to what you wrote, Thomas, but I wanted to add something that HN users might benefit from hearing:
It's important to use a carefully designed AEAD mode rather than assembling it yourself out of parts. If you try to combine a block cipher mode and message authenticator together, you might screw it up in a really funny way: https://soatok.blog/2021/07/30/canonicalization-attacks-agai...
Sanketh's talk at Real World Crypto 2024 about Next-Generation AEADs is also worth a watch for anyone that, for whatever weird reason, feels at all motivated to invent a new wheel here: https://www.youtube.com/watch?v=7GBzKytVjH4
peterldowns 3 hours ago [-]
If you're interested in doing AEAD with the current best-practice algorithms in golang, you might get inspiration from my work-in-progress symcrypt package. I'm not a cryptographer and you shouldn't trust me when I say it works correctly — but it's basically just a small, correct, wrapper around the chacha20poly1305 code in the golang standard library. It has the slight advantage of using different types for the plaintext and the associated data (here called Owner, because I use it to store API keys owned by specific
If you squint at the example usage in the tests, it's basically the API that the blogpost describes.
As an aside, I'm always curious to understand why the encryption people say "never roll your own crypto" but then also ship confusing APIs without clear usage examples. For instance, check out the golang chacha20poly1305 docs:
I don't understand what you're finding unclear about the Chapoly docs there. AEAD encryption is a first-class abstraction in the standard Go crypto library; in the same sense that crypto/sha256 functions return a crypto.Hash, chacha20poly1305 returns a crypto.AEAD. AEAD itself includes clear usage examples.
Your `symcrypt` interface lands in a pretty weird place? AEADs in Go export "Seal" and "Unseal" --- with deliberately different names than crypto/cipher/Block's "Encrypt" and "Decrypt", because they're doing something different. The "Owner" thing in your package is kind of odd too.
You're exposing an interface over Go's AEAD primitives, but not letting users actually provide authenticated data. I don't much care except the whole point of this post is why that matters.
gerdesj 2 hours ago [-]
When it comes to this stuff you generally have to follow advice from someone who understands what is going on.
For me, personally, I'm going to side with tptacek - he has a track record that I have seen over at least a decade if not two.
I don't know the other bloke but this is a bit of a worry: "I'm not a cryptographer".
peterldowns 2 hours ago [-]
Agreed; I posted here with the goal of receiving advice.
peterldowns 2 hours ago [-]
What I'm finding unclear is how to use the chapoly primitives in a secure way to accomplish my goal. I want to use AEAD encryption to store API keys, per customer, using a single app secret that my app will read from my secret manager when it starts up. AEAD seems like the right way to do that. What's the right way to do that, with AEAD, in golang? The examples/docs for cipher.AEAD at https://pkg.go.dev/crypto/cipher#AEAD don't mention chapoly, but a security reseracher friend of mine recommended I use it. The docs/examples for the chapoly library have two methods — New and NewX, and only NewX has an example. In that example, no associated data is actually used.
> Your `symcrypt` interface lands in a pretty weird place? AEADs in Go export "Seal" and "Unseal" --- with deliberately different names than crypto/cipher/Block's "Encrypt" and "Decrypt", because they're doing something different.
What should I use? I'd be extremely happy to do the Right Thing. I linked symcrypt and posted here because I am hoping someone can point me to it.
> You're exposing an interface over Go's AEAD primitives, but not letting users actually provide authenticated data.
I really don't understand what you mean by "not letting users actually provide authenticated data". Here, in this test, I show how if you encrypt some secret for one user (the associated data is the Owner), you can only decrypt it if you provide the same associated data (the same Owner). https://github.com/peterldowns/symcrypt/blob/c220f7767fa6c1a...
tptacek 2 hours ago [-]
I feel like you haven't really gotten your head around what Authenticated Data is. That's OK! Look at 'stavros upthread --- lots of clueful people have trouble with this concept.
What you should do is just take the examples from cipher#AEAD, but where they do:
The rest of the code is the same, except that where they write "Never use more than 2^32 random nonces with a given key because of the risk of a repeat", you can ignore that and use a long nonce (like in the example for chacha20poly1305.NewX).
Your "Owner" looks like what cryptographers would call a "domain separation constant". Domain separation is good! It's another application of authenticated data, too. But not the only one.
The Go standard library's AEAD "Seal" and "Unseal" is a better interface than what you've got now.
peterldowns 2 hours ago [-]
Thank you!
hxtk 2 hours ago [-]
A lot of the hardest problems in practical cryptography come down less to the abstractions around literally encrypting and decrypting things and more around the secure management of key material to ensure that the application supports things like online key rotation and makes it easy to verify that keys are being generated, serialized, and stored securely, and addressing the "First Secret" problem. If you're wanting to learn to use cryptography by developing an abstraction over stdlib cryptographic APIs, I would encourage you to find solutions to those problems.
Another source of inspiration (and something I use in production) is the Tink family of cryptographic libraries by Google [1]. Their Go implementation [2] is not without its warts, but it's very difficult to run into any of those security bugs that exist around cryptography. Where the Go documentation lacks, there are some examples in the developer docs that help fill some of the gaps [3] [4].
The documentation isn't 100% complete, but I find it more discoverable than the standard library because while the standard library requires you to read both `crypto/cipher` and `crypto/aes` or `golang.org/x/crypto/chacha20poly1305` depending on what kind of cipher you want, Tink organizes it by use cases [5] and generally groups together all the things you need to do cryptographic operations under the use-case-named interfaces in the `tink` package [6], with the corresponding key generation templates located under the top-level packages of the same name [7].
Can someone explain what use the AD is, if we have to decrypt the message to authenticate the AD? If I'm decrypting the message already just to authenticate it, why wouldn't I encrypt the AD as well?
tptacek 3 hours ago [-]
Because you need it outside the context of encryption/decryption.
Honestly, the classic "message routing" example most things give for AEAD is not very useful. Context binding is a much better primer for intuition.
stavros 3 hours ago [-]
Hm, I understand the use cases, but I don't understand this: The only way to get the AD is to decrypt the ciphertext, right? Otherwise the data is unauthenticated, so I assume it's a big no-no to access it. If you need to decrypt the ciphertext to access the AD, why do you care if it was encrypted or not?
Basically, I'm not sure why `encrypt(key, nonce, (data, associated data))` (ie adding the AD to your ciphertext, with the encryption framework being unaware of it) is that different from `encrypt(key, nonce, data, associated data)` (ie the AD being a first-class citizen).
EDIT: I saw your other message, and this makes it click for me:
> authenticated data can include data that doesn't even appear in the message, but rather is derived from the context at the time the message is encrypted and decrypted
So the AD can be an additional envelope-level thing at encryption/decryption time, that helps a lot, thanks!
tptacek 3 hours ago [-]
This is why message routing headers are kind of a fucky example (you can make it make sense but it begs for this confusion).
Instead, just take the chunked large-file encryption use case I gave in that comment. The chunk offset isn't recorded anywhere in the ciphertext. It's derived contextually while you decrypt the file. The AD ensures that the decryption will explode if you try to cut and paste chunks of the file into different positions.
vlovich123 28 minutes ago [-]
The strategy I used instead was to HKDF derive different keys for each chunk using the offset as part of the info to derive the key. No AD needed.
stavros 3 hours ago [-]
Yeah, you're right, I was thinking about it in a case where the implementation had the ciphertext being `(block data, chunk offset)` so it did make it part of the message, but it's more elegant for the associated data to be separate from the ciphertext.
firesteelrain 39 minutes ago [-]
Message headers have a tendency to want to mutate so it makes the problem more complicated to solve for but decrypting chunks in the right order is a good example to grasp because it’s referring to essential metadata that needs to stay open so readers of the data know what to do with it. AD is bound to the cipher text.
tptacek 3 hours ago [-]
It's a really good question, because, in order to verify the AD, you have to have the same key you need to decrypt it.
stavros 3 hours ago [-]
Yep, that's the part that throws me. Is it fair to say that it's a more elegant way to include metadata in the ciphertext, without really messing with the plaintext itself? Ie it's basically "just" a way to distinguish the message from its metadata?
edoceo 2 hours ago [-]
Does that make it like some kind of HMAC?
hxtk 45 minutes ago [-]
Yes, in fact, one construction of the AEAD primitive is to use AES-CTR with HMAC to "bolt on" authentication after the fact (AES-CTR on its own is an unauthenticated stream cipher).
I don't understand the example. Presumably the server doesn't have the user-owned encryption key. So how can the server "detect that the user id has been tampered with" if it doesn't have the key necessary to authenticate the user id?
cakoose 2 hours ago [-]
Yup, the example doesn't make sense for the reason you pointed out.
You could water down the example a bit to make it work:
1. Assume there's some other authentication mechanism for client-server communication, e.g. TLS.
2. The client sends the user ID unencrypted (within TLS) so the server can route, but encrypts the message contents so the server can't read it.
3. The final recipient can validate the message and the user ID.
This saves the client from having to send the user ID twice, once in the ciphertext and once in the clear.
Suppose you have a server with its own encryption key, and a key-value database full of encrypted data (secured by this root keyset) associated with a user.
Even if I gain access to the database, if the keys are managed securely, I can't read another user's data (or even really my own). I have to go through the authorization logic of the application that will decrypt it on my behalf.
However, if I can create a row in the database with my ID and another user's data, I can then convince the server I am authorized to view that cell, and it will happily decrypt it on my behalf, assuming something like AES-CTR or some other stream cipher without authentication.
Authenticated encryption like AES-CTR-HMAC solves that problem, because now the application will see that I am authorized to view that cell (because it sees the user ID matches mine) and it will decrypt it for me (using that user ID as the associated data), but the decryption will fail because the associated data does not match, leaving me unable to exfiltrate the data that I convinced the server belonged to me, and probably setting off some kind of alarm because that sort of decryption should never fail unless things have been tampered with.
I'm not overly fond of the example and I find it confusing as well. I think the example may be a bit confusing because the term "authentication" is overloaded between application-level authentication and cryptographic authentication, i.e., "if the chat protocol authenticates the user ID" sounds like it is talking about the user logging into the server securely. The user is authenticated by having a secret negotiated with the server. In the next bullet, they talk about "authenticating" the associated data, referring to it in the cryptographic context, but they don't indicate why that would be a problem because in their example, the malicious actor still doesn't have the key. The article handwaves it as "the attacker might be creative."
If they had the key, but not the associated data, you'd still be in a relatively bad situation, because the associated data is not secret. It doesn't serve as a second key because it is not high enough entropy and is ideally zero entropy conditional on already having all information from the originating context.
vlovich123 32 minutes ago [-]
But why bother putting the user id in the AD instead of part of the authenticated encrypted payload?
hxtk 5 minutes ago [-]
It sounds like you are assuming that if the data were modified, decryption would fail catastrophically and you'd end up with garbage. This is precisely the point of AEAD: providing cryptographic guarantee that decryption will fail catastrophically if things are tampered with.
That guarantee is not provided with unauthenticated stream ciphers. For example, some stream ciphers work by essentially using a deterministic but unpredictable PRNG seeded with the key and IV to generate a bitstream, and then XOR the plaintext with that bitstream to generate the ciphertext.
With such a stream cipher, If Eve knows that the data format is, e.g., the an 8-byte unsigned integer user ID followed by the rest of the payload, Eve can take the first 8 bytes of the ciphertext and XOR it with Bob's user ID (public information) and her own user ID to corrupt the message in such a way that the ID in the resulting cleartext would contain her user ID instead of Bob's, and thus pass the validation that it seems like you are proposing.
Let C[] be the cipher text, K[] be the key stream, B be Bob's ID, and E be Eve's ID:
C[:8] = B ^ K[:8]
C[:8] ^ B = K[:8]
C' = C[:8] ^ B ^ E ++ C[8:]
wofo 58 minutes ago [-]
It looks like I actually got the example wrong, sorry about that!
Somehow I assumed that the server was able to authenticate the receiver id, but as you correctly point out, that would require knowing the encryption key. I'll have to think about a fix for the example.
hxtk 31 minutes ago [-]
A usual example I use (because it reflects how I tend to use AEAD in applications) is to assume the server (and only the server) has the keys for something like data-at-rest encryption. Application level logic decides whether the server is going to decrypt some data on behalf of the user, and the authenticated data prevents tampering.
If Alice saves some data to her account, but Eve manages to access the database, Eve can change the database state to convince the application to retrieve Alice's data for her (by cloning it into a row with her own user ID). However, when the application attempts to decrypt that data, it will fail because of the AEAD. This ensures that both the database and some service with access to the encryption key (or the encryption key itself) would have to be compromised in order for Eve to exfiltrate her illicit copy of Alice's data.
1 hours ago [-]
kazinator 45 minutes ago [-]
The TL;DR of this seems to be: the plaintext metadata accompanying ciphertext ("associated data") is mixed into the ciphertext's encryption (essentially as an initial vector). Thereby, if the plain-text data is altered, the ciphertext cannot be correctly decrypted. The ciphertext is both a secret message, and a signature of the unencrypted data, so a separate HMAC is not required.
We can imagine, e.g. in the context of e-mail, if the DKIM header signature were combined a PGP-encrypted body as one operation. I'm ducking under the table now, though.
twic 3 hours ago [-]
Internally, is AEAD just using the "usual" ciphers, digests, and PRNGs, just making sure to combine them in the right way? If so, are all AEAD "ciphers" the same, just with different sub-primitives plugged in?
tptacek 3 hours ago [-]
Not generally. An AEAD composed the way you're describing, out of (say) non-authenticated CTR mode and an HMAC MAC, would be described as "a generic composition". The more common AEADs, at least the way we think about them, aren't compositions of otherwise user-serviceable components. I'm not sure there's a name for them; they're the norm, so we describe those integrated, hermetically-sealed constructions (like GCM) as "AEAD".
andrekandre 2 hours ago [-]
i really appreciate how this article was written
just the right length and pacing to get me to the end and the point across
wofo 41 minutes ago [-]
Thanks for the kind words! I'm trying to balance pragmatism with depth. Glad it was useful to you ;)
I wrote a local file encryption tool, around the same time Filippo was doing `age`, and used the AD on Chapoly to authenticate the chunk offset into the file. (The only thing interesting my tool did was that it could pull keys from AWS KMS).
So one use for AD is to authenticate headers; another is contextual binding.
If it helps (because 'stavros asked across the thread why bother having AD at all rather than just including it in the ciphertext), authenticated data can include data that doesn't even appear in the message, but rather is derived from the context at the time the message is encrypted and decrypted. A message only meant to be decrypted on a particular host (or whatever), for instance, could include the host in its AD, but never record that in the actual bits of the message.
It's important to use a carefully designed AEAD mode rather than assembling it yourself out of parts. If you try to combine a block cipher mode and message authenticator together, you might screw it up in a really funny way: https://soatok.blog/2021/07/30/canonicalization-attacks-agai...
Sanketh's talk at Real World Crypto 2024 about Next-Generation AEADs is also worth a watch for anyone that, for whatever weird reason, feels at all motivated to invent a new wheel here: https://www.youtube.com/watch?v=7GBzKytVjH4
If you squint at the example usage in the tests, it's basically the API that the blogpost describes.
https://github.com/peterldowns/symcrypt/blob/main/symcrypt_t...
As an aside, I'm always curious to understand why the encryption people say "never roll your own crypto" but then also ship confusing APIs without clear usage examples. For instance, check out the golang chacha20poly1305 docs:
https://pkg.go.dev/golang.org/x/crypto/chacha20poly1305
Your `symcrypt` interface lands in a pretty weird place? AEADs in Go export "Seal" and "Unseal" --- with deliberately different names than crypto/cipher/Block's "Encrypt" and "Decrypt", because they're doing something different. The "Owner" thing in your package is kind of odd too.
You're exposing an interface over Go's AEAD primitives, but not letting users actually provide authenticated data. I don't much care except the whole point of this post is why that matters.
For me, personally, I'm going to side with tptacek - he has a track record that I have seen over at least a decade if not two.
I don't know the other bloke but this is a bit of a worry: "I'm not a cryptographer".
> Your `symcrypt` interface lands in a pretty weird place? AEADs in Go export "Seal" and "Unseal" --- with deliberately different names than crypto/cipher/Block's "Encrypt" and "Decrypt", because they're doing something different.
What should I use? I'd be extremely happy to do the Right Thing. I linked symcrypt and posted here because I am hoping someone can point me to it.
> You're exposing an interface over Go's AEAD primitives, but not letting users actually provide authenticated data.
I really don't understand what you mean by "not letting users actually provide authenticated data". Here, in this test, I show how if you encrypt some secret for one user (the associated data is the Owner), you can only decrypt it if you provide the same associated data (the same Owner). https://github.com/peterldowns/symcrypt/blob/c220f7767fa6c1a...
What you should do is just take the examples from cipher#AEAD, but where they do:
Instead you just do The rest of the code is the same, except that where they write "Never use more than 2^32 random nonces with a given key because of the risk of a repeat", you can ignore that and use a long nonce (like in the example for chacha20poly1305.NewX).Your "Owner" looks like what cryptographers would call a "domain separation constant". Domain separation is good! It's another application of authenticated data, too. But not the only one.
The Go standard library's AEAD "Seal" and "Unseal" is a better interface than what you've got now.
Another source of inspiration (and something I use in production) is the Tink family of cryptographic libraries by Google [1]. Their Go implementation [2] is not without its warts, but it's very difficult to run into any of those security bugs that exist around cryptography. Where the Go documentation lacks, there are some examples in the developer docs that help fill some of the gaps [3] [4].
The documentation isn't 100% complete, but I find it more discoverable than the standard library because while the standard library requires you to read both `crypto/cipher` and `crypto/aes` or `golang.org/x/crypto/chacha20poly1305` depending on what kind of cipher you want, Tink organizes it by use cases [5] and generally groups together all the things you need to do cryptographic operations under the use-case-named interfaces in the `tink` package [6], with the corresponding key generation templates located under the top-level packages of the same name [7].
[1]: https://developers.google.com/tink
[2]: https://github.com/tink-crypto/tink-go/
[3]: https://developers.google.com/tink/key-management-overview#g...
[4]: https://developers.google.com/tink/encrypt-data#go
[5]: https://developers.google.com/tink/choose-primitive
[6]: https://pkg.go.dev/github.com/tink-crypto/tink-go/v2/tink#AE...
[7]: https://pkg.go.dev/github.com/tink-crypto/tink-go/v2@v2.4.0/...
https://news.ycombinator.com/item?id=43827342
Honestly, the classic "message routing" example most things give for AEAD is not very useful. Context binding is a much better primer for intuition.
Basically, I'm not sure why `encrypt(key, nonce, (data, associated data))` (ie adding the AD to your ciphertext, with the encryption framework being unaware of it) is that different from `encrypt(key, nonce, data, associated data)` (ie the AD being a first-class citizen).
EDIT: I saw your other message, and this makes it click for me:
> authenticated data can include data that doesn't even appear in the message, but rather is derived from the context at the time the message is encrypted and decrypted
So the AD can be an additional envelope-level thing at encryption/decryption time, that helps a lot, thanks!
Instead, just take the chunked large-file encryption use case I gave in that comment. The chunk offset isn't recorded anywhere in the ciphertext. It's derived contextually while you decrypt the file. The AD ensures that the decryption will explode if you try to cut and paste chunks of the file into different positions.
You can find an implementation of AES-CTR-HMAC (at a high level where AES-CTR and HMAC are both given) here: https://github.com/tink-crypto/tink-go/blob/main/aead/aesctr...
You could water down the example a bit to make it work:
1. Assume there's some other authentication mechanism for client-server communication, e.g. TLS.
2. The client sends the user ID unencrypted (within TLS) so the server can route, but encrypts the message contents so the server can't read it.
3. The final recipient can validate the message and the user ID.
This saves the client from having to send the user ID twice, once in the ciphertext and once in the clear.
But another more interesting use case is when you don't even send the associated data: https://news.ycombinator.com/item?id=43827342
Even if I gain access to the database, if the keys are managed securely, I can't read another user's data (or even really my own). I have to go through the authorization logic of the application that will decrypt it on my behalf.
However, if I can create a row in the database with my ID and another user's data, I can then convince the server I am authorized to view that cell, and it will happily decrypt it on my behalf, assuming something like AES-CTR or some other stream cipher without authentication.
Authenticated encryption like AES-CTR-HMAC solves that problem, because now the application will see that I am authorized to view that cell (because it sees the user ID matches mine) and it will decrypt it for me (using that user ID as the associated data), but the decryption will fail because the associated data does not match, leaving me unable to exfiltrate the data that I convinced the server belonged to me, and probably setting off some kind of alarm because that sort of decryption should never fail unless things have been tampered with.
I'm not overly fond of the example and I find it confusing as well. I think the example may be a bit confusing because the term "authentication" is overloaded between application-level authentication and cryptographic authentication, i.e., "if the chat protocol authenticates the user ID" sounds like it is talking about the user logging into the server securely. The user is authenticated by having a secret negotiated with the server. In the next bullet, they talk about "authenticating" the associated data, referring to it in the cryptographic context, but they don't indicate why that would be a problem because in their example, the malicious actor still doesn't have the key. The article handwaves it as "the attacker might be creative."
If they had the key, but not the associated data, you'd still be in a relatively bad situation, because the associated data is not secret. It doesn't serve as a second key because it is not high enough entropy and is ideally zero entropy conditional on already having all information from the originating context.
That guarantee is not provided with unauthenticated stream ciphers. For example, some stream ciphers work by essentially using a deterministic but unpredictable PRNG seeded with the key and IV to generate a bitstream, and then XOR the plaintext with that bitstream to generate the ciphertext.
With such a stream cipher, If Eve knows that the data format is, e.g., the an 8-byte unsigned integer user ID followed by the rest of the payload, Eve can take the first 8 bytes of the ciphertext and XOR it with Bob's user ID (public information) and her own user ID to corrupt the message in such a way that the ID in the resulting cleartext would contain her user ID instead of Bob's, and thus pass the validation that it seems like you are proposing.
Let C[] be the cipher text, K[] be the key stream, B be Bob's ID, and E be Eve's ID:
C[:8] = B ^ K[:8]
C[:8] ^ B = K[:8]
C' = C[:8] ^ B ^ E ++ C[8:]
Somehow I assumed that the server was able to authenticate the receiver id, but as you correctly point out, that would require knowing the encryption key. I'll have to think about a fix for the example.
If Alice saves some data to her account, but Eve manages to access the database, Eve can change the database state to convince the application to retrieve Alice's data for her (by cloning it into a row with her own user ID). However, when the application attempts to decrypt that data, it will fail because of the AEAD. This ensures that both the database and some service with access to the encryption key (or the encryption key itself) would have to be compromised in order for Eve to exfiltrate her illicit copy of Alice's data.
We can imagine, e.g. in the context of e-mail, if the DKIM header signature were combined a PGP-encrypted body as one operation. I'm ducking under the table now, though.
just the right length and pacing to get me to the end and the point across
All the best,
-HG