▲What the heck is AEAD again?ochagavia.nl

52 points by wofo 68 days ago | 64 comments

tptacek 68 days ago [-]

Another AD example: Ben Toews, in our Vault replacement secret storage system Pet Semetary, uses the AD on SQLite ciphertexts to bind them to a particular row (and/or a particular key path).

I wrote a local file encryption tool, around the same time Filippo was doing `age`, and used the AD on Chapoly to authenticate the chunk offset into the file. (The only thing interesting my tool did was that it could pull keys from AWS KMS).

So one use for AD is to authenticate headers; another is contextual binding.

If it helps (because 'stavros asked across the thread why bother having AD at all rather than just including it in the ciphertext), authenticated data can include data that doesn't even appear in the message, but rather is derived from the context at the time the message is encrypted and decrypted. A message only meant to be decrypted on a particular host (or whatever), for instance, could include the host in its AD, but never record that in the actual bits of the message.

some_furry 68 days ago [-]

This is mostly unrelated to what you wrote, Thomas, but I wanted to add something that HN users might benefit from hearing:

It's important to use a carefully designed AEAD mode rather than assembling it yourself out of parts. If you try to combine a block cipher mode and message authenticator together, you might screw it up in a really funny way: https://soatok.blog/2021/07/30/canonicalization-attacks-agai...

Sanketh's talk at Real World Crypto 2024 about Next-Generation AEADs is also worth a watch for anyone that, for whatever weird reason, feels at all motivated to invent a new wheel here: https://www.youtube.com/watch?v=7GBzKytVjH4

peterldowns 68 days ago [-]

If you're interested in doing AEAD with the current best-practice algorithms in golang, you might get inspiration from my work-in-progress symcrypt package. I'm not a cryptographer and you shouldn't trust me when I say it works correctly — but it's basically just a small, correct, wrapper around the chacha20poly1305 code in the golang standard library. It has the slight advantage of using different types for the plaintext and the associated data (here called Owner, because I use it to store API keys owned by specific

If you squint at the example usage in the tests, it's basically the API that the blogpost describes.

https://github.com/peterldowns/symcrypt/blob/main/symcrypt_t...

As an aside, I'm always curious to understand why the encryption people say "never roll your own crypto" but then also ship confusing APIs without clear usage examples. For instance, check out the golang chacha20poly1305 docs:

https://pkg.go.dev/golang.org/x/crypto/chacha20poly1305

tptacek 68 days ago [-]

I don't understand what you're finding unclear about the Chapoly docs there. AEAD encryption is a first-class abstraction in the standard Go crypto library; in the same sense that crypto/sha256 functions return a crypto.Hash, chacha20poly1305 returns a crypto.AEAD. AEAD itself includes clear usage examples.

Your `symcrypt` interface lands in a pretty weird place? AEADs in Go export "Seal" and "Unseal" --- with deliberately different names than crypto/cipher/Block's "Encrypt" and "Decrypt", because they're doing something different. The "Owner" thing in your package is kind of odd too.

You're exposing an interface over Go's AEAD primitives, but not letting users actually provide authenticated data. I don't much care except the whole point of this post is why that matters.

gerdesj 68 days ago [-]

When it comes to this stuff you generally have to follow advice from someone who understands what is going on.

For me, personally, I'm going to side with tptacek - he has a track record that I have seen over at least a decade if not two.

I don't know the other bloke but this is a bit of a worry: "I'm not a cryptographer".

peterldowns 68 days ago [-]

Agreed; I posted here with the goal of receiving advice.

peterldowns 68 days ago [-]

What I'm finding unclear is how to use the chapoly primitives in a secure way to accomplish my goal. I want to use AEAD encryption to store API keys, per customer, using a single app secret that my app will read from my secret manager when it starts up. AEAD seems like the right way to do that. What's the right way to do that, with AEAD, in golang? The examples/docs for cipher.AEAD at https://pkg.go.dev/crypto/cipher#AEAD don't mention chapoly, but a security reseracher friend of mine recommended I use it. The docs/examples for the chapoly library have two methods — New and NewX, and only NewX has an example. In that example, no associated data is actually used.

> Your `symcrypt` interface lands in a pretty weird place? AEADs in Go export "Seal" and "Unseal" --- with deliberately different names than crypto/cipher/Block's "Encrypt" and "Decrypt", because they're doing something different.

What should I use? I'd be extremely happy to do the Right Thing. I linked symcrypt and posted here because I am hoping someone can point me to it.

> You're exposing an interface over Go's AEAD primitives, but not letting users actually provide authenticated data.

I really don't understand what you mean by "not letting users actually provide authenticated data". Here, in this test, I show how if you encrypt some secret for one user (the associated data is the Owner), you can only decrypt it if you provide the same associated data (the same Owner). https://github.com/peterldowns/symcrypt/blob/c220f7767fa6c1a...

tptacek 68 days ago [-]

I feel like you haven't really gotten your head around what Authenticated Data is. That's OK! Look at 'stavros upthread --- lots of clueful people have trouble with this concept.

What you should do is just take the examples from cipher#AEAD, but where they do:

    block, err := aes.NewCipher(key)
    if err != nil {
      panic(err.Error())
    }

    aesgcm, err := cipher.NewGCM(block)
    if err != nil {
      panic(err.Error())
    }

Instead you just do

    chapoly, err := chacha20poly1305.NewX(key)
    if err != nil {
      panic(err.Error())
    }

The rest of the code is the same, except that where they write "Never use more than 2^32 random nonces with a given key because of the risk of a repeat", you can ignore that and use a long nonce (like in the example for chacha20poly1305.NewX).

Your "Owner" looks like what cryptographers would call a "domain separation constant". Domain separation is good! It's another application of authenticated data, too. But not the only one.

The Go standard library's AEAD "Seal" and "Unseal" is a better interface than what you've got now.

peterldowns 68 days ago [-]

Thank you!

hxtk 68 days ago [-]

A lot of the hardest problems in practical cryptography come down less to the abstractions around literally encrypting and decrypting things and more around the secure management of key material to ensure that the application supports things like online key rotation and makes it easy to verify that keys are being generated, serialized, and stored securely, and addressing the "First Secret" problem. If you're wanting to learn to use cryptography by developing an abstraction over stdlib cryptographic APIs, I would encourage you to find solutions to those problems.

Another source of inspiration (and something I use in production) is the Tink family of cryptographic libraries by Google [1]. Their Go implementation [2] is not without its warts, but it's very difficult to run into any of those security bugs that exist around cryptography. Where the Go documentation lacks, there are some examples in the developer docs that help fill some of the gaps [3] [4].

The documentation isn't 100% complete, but I find it more discoverable than the standard library because while the standard library requires you to read both `crypto/cipher` and `crypto/aes` or `golang.org/x/crypto/chacha20poly1305` depending on what kind of cipher you want, Tink organizes it by use cases [5] and generally groups together all the things you need to do cryptographic operations under the use-case-named interfaces in the `tink` package [6], with the corresponding key generation templates located under the top-level packages of the same name [7].

[1]: https://developers.google.com/tink

[2]: https://github.com/tink-crypto/tink-go/

[3]: https://developers.google.com/tink/key-management-overview#g...

[4]: https://developers.google.com/tink/encrypt-data#go

[5]: https://developers.google.com/tink/choose-primitive

[6]: https://pkg.go.dev/github.com/tink-crypto/tink-go/v2/tink#AE...

[7]: https://pkg.go.dev/github.com/tink-crypto/tink-go/v2@v2.4.0/...

zorgmonkey 68 days ago [-]

it is worth pointing out that tink has binding for a bunch languages (C++, objective-c, rust, python, go and java) and has support for a bunch key management systems (GCP, AWS and Hashicorp)

stavros 68 days ago [-]

Can someone explain what use the AD is, if we have to decrypt the message to authenticate the AD? If I'm decrypting the message already just to authenticate it, why wouldn't I encrypt the AD as well?

tptacek 68 days ago [-]

Because you need it outside the context of encryption/decryption.

https://news.ycombinator.com/item?id=43827342

Honestly, the classic "message routing" example most things give for AEAD is not very useful. Context binding is a much better primer for intuition.

stavros 68 days ago [-]

Hm, I understand the use cases, but I don't understand this: The only way to get the AD is to decrypt the ciphertext, right? Otherwise the data is unauthenticated, so I assume it's a big no-no to access it. If you need to decrypt the ciphertext to access the AD, why do you care if it was encrypted or not?

Basically, I'm not sure why `encrypt(key, nonce, (data, associated data))` (ie adding the AD to your ciphertext, with the encryption framework being unaware of it) is that different from `encrypt(key, nonce, data, associated data)` (ie the AD being a first-class citizen).

EDIT: I saw your other message, and this makes it click for me:

> authenticated data can include data that doesn't even appear in the message, but rather is derived from the context at the time the message is encrypted and decrypted

So the AD can be an additional envelope-level thing at encryption/decryption time, that helps a lot, thanks!

tptacek 68 days ago [-]

This is why message routing headers are kind of a fucky example (you can make it make sense but it begs for this confusion).

Instead, just take the chunked large-file encryption use case I gave in that comment. The chunk offset isn't recorded anywhere in the ciphertext. It's derived contextually while you decrypt the file. The AD ensures that the decryption will explode if you try to cut and paste chunks of the file into different positions.

stavros 68 days ago [-]

Yeah, you're right, I was thinking about it in a case where the implementation had the ciphertext being `(block data, chunk offset)` so it did make it part of the message, but it's more elegant for the associated data to be separate from the ciphertext.

firesteelrain 68 days ago [-]

Message headers have a tendency to want to mutate so it makes the problem more complicated to solve for but decrypting chunks in the right order is a good example to grasp because it’s referring to essential metadata that needs to stay open so readers of the data know what to do with it. AD is bound to the cipher text.

tptacek 68 days ago [-]

It's a really good question, because, in order to verify the AD, you have to have the same key you need to decrypt it.

stavros 68 days ago [-]

Yep, that's the part that throws me. Is it fair to say that it's a more elegant way to include metadata in the ciphertext, without really messing with the plaintext itself? Ie it's basically "just" a way to distinguish the message from its metadata?

edoceo 68 days ago [-]

Does that make it like some kind of HMAC?

hxtk 68 days ago [-]

Yes, in fact, one construction of the AEAD primitive is to use AES-CTR with HMAC to "bolt on" authentication after the fact (AES-CTR on its own is an unauthenticated stream cipher).

You can find an implementation of AES-CTR-HMAC (at a high level where AES-CTR and HMAC are both given) here: https://github.com/tink-crypto/tink-go/blob/main/aead/aesctr...

andrewflnr 68 days ago [-]

> The AD ensures that the decryption will explode if you try to cut and paste chunks of the file into different positions.

Ah, that's the key bit of perspective. Just talking about "context" is so abstract. That's a case where you don't even need to transmit the AD, right? Do you ever have cases where the AD is a mix of transmitted and locally/"contextually" derived data?

vlovich123 68 days ago [-]

The strategy I used instead was to HKDF derive different keys for each chunk using the offset as part of the info to derive the key. No AD needed.

tatersolid 67 days ago [-]

Two calls to SHA-256 for each block would be very slow compared with a modern AEAD.

vlovich123 67 days ago [-]

Hmm… it seemed to run at line speed on our machines. I’m also not sure where you’re getting two calls for sha256 from? Like 1 to derive the key (which is sha256 on a very small amount of data) and the second is?

tatersolid 67 days ago [-]

HMAC requires two calls to the underlying hash function. In this case one is with a block-size input and the other is smaller (key size plus output of the first call). When called per-block this approach is much slower than any modern AEAD (which typically requires simple polynomial math on each block plus a single AES/ChaCha/whatever finalization call).

It might be “fast enough for line rate” in your situation but even then you could be saving CPU cycles for other work by using a more efficient construction.

tptacek 67 days ago [-]

What does it mean for an AEAD construction to "run at line speed"? Over how many different sessions and with what message sizes and on what hardware?

vlovich123 67 days ago [-]

As in it's processing at the speed that data can be fed into the CPU. This particular use case was files coming in from the network on 10Gbps hardware but that was about the speed the AES HW ran via openssl perf tests. How many sessions and message sizes are irrelevant. Hardware was AMD EPYC 7642.

tptacek 67 days ago [-]

It's irrelevant to your use case. It's not irrelevant to the broader CS question of why people don't just do what you're doing.

vlovich123 67 days ago [-]

If there’s no HW that demonstrates a speed difference, then maybe the theoretical CS concerns aren’t properly modeled? Also, the approach I outlined has a strength whereby there’s no nonce to mismanage which is a big strength.

tptacek 67 days ago [-]

You'd have to publish details of the construction you came up with for me to have anything to say about it not having or needing a nonce.

dadrian 67 days ago [-]

I dunno why you say it isn't useful. It is inherently plaintext, but still worth authenticating. If you just used an AEAD but didn't put e.g. the session identifier or connection ID or sequence number in the AD, it would be entirely unauthenticated, but the decryption of, say, the message body would still succeed.

tptacek 67 days ago [-]

I'm not saying it isn't useful, I'm saying it's not a useful example for getting people to understand the concept. Everyone runs aground on "but you need the decryption key to authenticate the plaintext anyways".

rendaw 68 days ago [-]

IIUC you don't have to decrypt the message - the outermost primitive is the authentication. The point I got is that both the encrypted data and unencrypted data are authenticated.

senderista 68 days ago [-]

I don't understand the example. Presumably the server doesn't have the user-owned encryption key. So how can the server "detect that the user id has been tampered with" if it doesn't have the key necessary to authenticate the user id?

cakoose 68 days ago [-]

Yup, the example doesn't make sense for the reason you pointed out.

You could water down the example a bit to make it work:

1. Assume there's some other authentication mechanism for client-server communication, e.g. TLS.

2. The client sends the user ID unencrypted (within TLS) so the server can route, but encrypts the message contents so the server can't read it.

3. The final recipient can validate the message and the user ID.

This saves the client from having to send the user ID twice, once in the ciphertext and once in the clear.

But another more interesting use case is when you don't even send the associated data: https://news.ycombinator.com/item?id=43827342

hxtk 68 days ago [-]

Suppose you have a server with its own encryption key, and a key-value database full of encrypted data (secured by this root keyset) associated with a user.

Even if I gain access to the database, if the keys are managed securely, I can't read another user's data (or even really my own). I have to go through the authorization logic of the application that will decrypt it on my behalf.

However, if I can create a row in the database with my ID and another user's data, I can then convince the server I am authorized to view that cell, and it will happily decrypt it on my behalf, assuming something like AES-CTR or some other stream cipher without authentication.

Authenticated encryption like AES-CTR-HMAC solves that problem, because now the application will see that I am authorized to view that cell (because it sees the user ID matches mine) and it will decrypt it for me (using that user ID as the associated data), but the decryption will fail because the associated data does not match, leaving me unable to exfiltrate the data that I convinced the server belonged to me, and probably setting off some kind of alarm because that sort of decryption should never fail unless things have been tampered with.

I'm not overly fond of the example and I find it confusing as well. I think the example may be a bit confusing because the term "authentication" is overloaded between application-level authentication and cryptographic authentication, i.e., "if the chat protocol authenticates the user ID" sounds like it is talking about the user logging into the server securely. The user is authenticated by having a secret negotiated with the server. In the next bullet, they talk about "authenticating" the associated data, referring to it in the cryptographic context, but they don't indicate why that would be a problem because in their example, the malicious actor still doesn't have the key. The article handwaves it as "the attacker might be creative."

If they had the key, but not the associated data, you'd still be in a relatively bad situation, because the associated data is not secret. It doesn't serve as a second key because it is not high enough entropy and is ideally zero entropy conditional on already having all information from the originating context.

vlovich123 68 days ago [-]

But why bother putting the user id in the AD instead of part of the authenticated encrypted payload?

hxtk 68 days ago [-]

It sounds like you are assuming that if the data were modified, decryption would fail catastrophically and you'd end up with garbage. This is precisely the point of AEAD: providing cryptographic guarantee that decryption will fail catastrophically if things are tampered with.

That guarantee is not provided with unauthenticated stream ciphers. For example, some stream ciphers work by essentially using a deterministic but unpredictable PRNG seeded with the key and IV to generate a bitstream, and then XOR the plaintext with that bitstream to generate the ciphertext.

With such a stream cipher, If Eve knows that the data format is, e.g., the an 8-byte unsigned integer user ID followed by the rest of the payload, Eve can take the first 8 bytes of the ciphertext and XOR it with Bob's user ID (public information) and her own user ID to corrupt the message in such a way that the ID in the resulting cleartext would contain her user ID instead of Bob's, and thus pass the validation that it seems like you are proposing.

Let C[] be the cipher text, K[] be the key stream, B be Bob's ID, and E be Eve's ID:

C[:8] = B ^ K[:8]

C[:8] ^ B = K[:8]

C' = C[:8] ^ B ^ E ++ C[8:]

C' would decrypt, validate as "belonging" to Eve, and contain Bob's data.

vlovich123 68 days ago [-]

Nowhere did I say I used an unauthenticated cipher. It was all authenticated cipher. Indeed, usually it was still AEAD (AES-GCM), but instead of using the offset as the AD I simply derived a new key from the offset, thus not using the AD part. This way I would swap out the algorithm to an authenticated cipher that wasn't AEAD (e.g. AES-CBC+HMAC) without breaking how anything worked.

hxtk 68 days ago [-]

It sounds like the source of confusion is in terminology. I would consider AES-CBC+HMAC to be an AEAD construction, and I would consider whatever "secret key" you pass into HMAC along with the ciphertext to be the "associated data". AES-GCM is an AEAD construction that gives you the MAC organically as part of the cipher, but that is not what makes it AEAD as I understand it.

If you are using an AEAD cipher mode, then you always have AD, but sometimes that AD might be the empty string. In that case, the advantage to using contextual AD as opposed to using the empty string as AD and then doing additional verification on the decrypted object is that it prevents some kinds of timing attacks, because cryptographic libraries will often implement AEAD constructions to fail in constant time, where as your scheme will take longer if post-decryption validation of contextual data encoded in the plaintext fails compared to if decryption fails.

vlovich123 68 days ago [-]

> I would consider AES-CBC+HMAC to be an AEAD construction

OK fair.

> and I would consider whatever "secret key" you pass into HMAC along with the ciphertext to be the "associated data"

The secret key isn't associated data. You take your base HKDF key and expand new crypto for an authenticated cipher from the offset as info (+ maybe other parameters like file name). That key is then used to decrypt. If you squint I guess you could call that AD but it's functionally a very different role.

> because cryptographic libraries will often implement AEAD constructions to fail in constant time, where as your scheme will take longer if post-decryption validation of contextual data encoded in the plaintext fails compared to if decryption fails.

I think you've misunderstood what I said. As I repeat above, the AEAD key is derived from the offset. There's no post-decryption validation of contextual data because the plaintext is empty. HKDF derivation is constant time and authenticated decryption is constant time. Once decrypted you have a valid block at that location. There's nothing extra left to validate (or perhaps the decrypted contents, but that's irrelevant for cryptographic purposes).

My broader point is that I have yet to encounter a use-case for a non-empty AD string.

tptacek 68 days ago [-]

There are several of them on this thread. For example: encrypt a 10 gigabyte file. You'll need to chunk it; each chunk will exist at an offset. Encode the chunk offset into the Associated Data. Notice that you never store this Associated Data; you simply have it in the context of encrypting and decrypting the file. But the AEAD MAC captures it, and now you can't cut and paste chunks of a ciphertext.

vlovich123 68 days ago [-]

Or alternatively, as I said, derive a new key for each chunk where the offset is part of the info used to derive the key and have an empty AD. Same effect.

hxtk 68 days ago [-]

It sounds like you have constructed a way to encrypt data such that it can only be decrypted by someone who has the same (secret) key and (non-secret) associated data that was used to encrypt it, or else decryption fails.

As you said, same effect: the scheme you have described is not an alternative to AEAD. It is an example of AEAD. You're still using the offset as associated data, you just happen to be composing your AEAD scheme out of another AEAD scheme into which you pass an empty string as associated data.

Other than differences in the limit to the number of messages you can encrypt before nonce exhaustion or the number of bits of secrecy or authentication strength provided, the external interface and use case of your system perfectly matches that of AES-GCM or other popular AEAD constructions.

vlovich123 67 days ago [-]

> Other than differences in the limit to the number of messages you can encrypt before nonce exhaustion or the number of bits of secrecy or authentication strength provided, the external interface and use case of your system perfectly matches that of AES-GCM or other popular AEAD constructions.

Ahh, but that’s not a trivial part of the design and why this is strictly better than using a single AES-GCM key with AD. And also it’s more generic across whatever type of key you choose to derive.

tptacek 67 days ago [-]

Whatever else using a KDF to mix Associated Data into your key might be, it's not "strictly better" than GCM. I'm not sure there's much wrong with it (it's the same approach XAES uses for extended nonces) other than how slow it is, but there's a reason cryptographers don't design all the AEADs to work this way.

tptacek 68 days ago [-]

I may be missing a subtlety here but it seems like you've essentially reinvented Associated Data but with an extra KDF extraction and an AES key expansion for every chunk.

68 days ago [-]

wofo 68 days ago [-]

It looks like I actually got the example wrong, sorry about that!

Somehow I assumed that the server was able to authenticate the receiver id, but as you correctly point out, that would require knowing the encryption key. I'll have to think about a fix for the example.

hxtk 68 days ago [-]

A usual example I use (because it reflects how I tend to use AEAD in applications) is to assume the server (and only the server) has the keys for something like data-at-rest encryption. Application level logic decides whether the server is going to decrypt some data on behalf of the user, and the authenticated data prevents tampering.

If Alice saves some data to her account, but Eve manages to access the database, Eve can change the database state to convince the application to retrieve Alice's data for her (by cloning it into a row with her own user ID). However, when the application attempts to decrypt that data, it will fail because of the AEAD. This ensures that both the database and some service with access to the encryption key (or the encryption key itself) would have to be compromised in order for Eve to exfiltrate her illicit copy of Alice's data.

wofo 68 days ago [-]

Thanks for the example! It has helped me understand better the use case of AEAD for at-rest-encrypted-data.

I finally updated the example to a new one, though it's still message-based (it fits the rest of the article better). If I had come across your example earlier, I might have stayed away from a message-based formulation of the problem at all... Better luck next time I guess :)

twic 68 days ago [-]

Internally, is AEAD just using the "usual" ciphers, digests, and PRNGs, just making sure to combine them in the right way? If so, are all AEAD "ciphers" the same, just with different sub-primitives plugged in?

tptacek 68 days ago [-]

Not generally. An AEAD composed the way you're describing, out of (say) non-authenticated CTR mode and an HMAC MAC, would be described as "a generic composition". The more common AEADs, at least the way we think about them, aren't compositions of otherwise user-serviceable components. I'm not sure there's a name for them; they're the norm, so we describe those integrated, hermetically-sealed constructions (like GCM) as "AEAD".

coppsilgold 68 days ago [-]

An AEAD can be constructed from pieces made and studied for other purposes (eg. block ciphers and hash functions). There is also a cryptographic primitive which can be used for AEAD almost without modification: the cryptographic sponge. But even so this particular primitive is often tailored for the security requirements of AEAD to be more performant: https://ascon.isec.tugraz.at/specification.html

An AEAD can also be made de novo. Such as AEGIS[1], which performs encryption and authentication in one pass (much like the sponges, but much more performant).

[1] <https://competitions.cr.yp.to/round3/aegisv11.pdf>

syncsynchalt 68 days ago [-]

Even if you combine the operations so that it works, it may not be obvious whether you've opened yourself to side channels like timing attacks.

A naive implementation of the AEAD feature list could trivially allow you to guess the AD for a ciphertext if the AD validation is checked too early in the process.

kazinator 68 days ago [-]

The TL;DR of this seems to be: the plaintext metadata accompanying ciphertext ("associated data") is mixed into the ciphertext's encryption (essentially as an initial vector). Thereby, if the plain-text data is altered, the ciphertext cannot be correctly decrypted. The ciphertext is both a secret message, and a signature of the unencrypted data, so a separate HMAC is not required.

We can imagine, e.g. in the context of e-mail, if the DKIM header signature were combined a PGP-encrypted body as one operation. I'm ducking under the table now, though.

tptacek 68 days ago [-]

The core idea, one that PGP does not "get" (except in newer, non-compatible implementations) is simply that of ciphertext authentication. Once you have authentication, the Associated Data construction is pretty easy to get; put differently, almost immediately after we "had" widespread authenticated encryption, we had AEAD. PGP finds these problems difficult to solve (and so simply doesn't solve them) because it confuses error-detecting integrity checks, signatures, encryption, and message authentication, which are 4 different things. But PGP also predates our modern understanding of the differences between those 4 things.

kazinator 67 days ago [-]

I looked through that RFC 5XXX that describes AEAD. I somehow feel that there is nothing earth-shattering in it that would confuse cryptographers and crypto coders, if we sent it back to 1993.

tptacek 67 days ago [-]

Oh, wow, is this ever not true. If you want to blow your head off, go read the IPSEC working group's confrontation with Phil Rogaway over this issue; Rogaway went and canvassed other real cryptographers (notably Rivest) to try to talk sense into them, and failed.

andrekandre 68 days ago [-]

i really appreciate how this article was written

just the right length and pacing to get me to the end and the point across

wofo 68 days ago [-]

Thanks for the kind words! I'm trying to balance pragmatism with depth. Glad it was useful to you ;)

andrekandre 67 days ago [-]

yes, thank you for writing it, i really learned something!

dlenski 67 days ago [-]

Look, it's a great article, but the perfect title was right there.

> What's my AEAD again, what's my AEAD again?

halosghost 68 days ago [-]

All the best,

-HG

Loading comments...

tptacek 68 days ago [-]

Another AD example: Ben Toews, in our Vault replacement secret storage system Pet Semetary, uses the AD on SQLite ciphertexts to bind them to a particular row (and/or a particular key path).

So one use for AD is to authenticate headers; another is contextual binding.

some_furry 68 days ago [-]

This is mostly unrelated to what you wrote, Thomas, but I wanted to add something that HN users might benefit from hearing:

peterldowns 68 days ago [-]

If you squint at the example usage in the tests, it's basically the API that the blogpost describes.

https://github.com/peterldowns/symcrypt/blob/main/symcrypt_t...

https://pkg.go.dev/golang.org/x/crypto/chacha20poly1305

tptacek 68 days ago [-]

You're exposing an interface over Go's AEAD primitives, but not letting users actually provide authenticated data. I don't much care except the whole point of this post is why that matters.

gerdesj 68 days ago [-]

When it comes to this stuff you generally have to follow advice from someone who understands what is going on.

For me, personally, I'm going to side with tptacek - he has a track record that I have seen over at least a decade if not two.

I don't know the other bloke but this is a bit of a worry: "I'm not a cryptographer".

peterldowns 68 days ago [-]

Agreed; I posted here with the goal of receiving advice.

peterldowns 68 days ago [-]

What should I use? I'd be extremely happy to do the Right Thing. I linked symcrypt and posted here because I am hoping someone can point me to it.

> You're exposing an interface over Go's AEAD primitives, but not letting users actually provide authenticated data.

tptacek 68 days ago [-]

I feel like you haven't really gotten your head around what Authenticated Data is. That's OK! Look at 'stavros upthread --- lots of clueful people have trouble with this concept.

What you should do is just take the examples from cipher#AEAD, but where they do:

    block, err := aes.NewCipher(key)
    if err != nil {
      panic(err.Error())
    }

    aesgcm, err := cipher.NewGCM(block)
    if err != nil {
      panic(err.Error())
    }

Instead you just do

    chapoly, err := chacha20poly1305.NewX(key)
    if err != nil {
      panic(err.Error())
    }

Your "Owner" looks like what cryptographers would call a "domain separation constant". Domain separation is good! It's another application of authenticated data, too. But not the only one.

The Go standard library's AEAD "Seal" and "Unseal" is a better interface than what you've got now.

peterldowns 68 days ago [-]

Thank you!

hxtk 68 days ago [-]

[1]: https://developers.google.com/tink

[2]: https://github.com/tink-crypto/tink-go/

[3]: https://developers.google.com/tink/key-management-overview#g...

[4]: https://developers.google.com/tink/encrypt-data#go

[5]: https://developers.google.com/tink/choose-primitive

[6]: https://pkg.go.dev/github.com/tink-crypto/tink-go/v2/tink#AE...

[7]: https://pkg.go.dev/github.com/tink-crypto/tink-go/v2@v2.4.0/...

zorgmonkey 68 days ago [-]

it is worth pointing out that tink has binding for a bunch languages (C++, objective-c, rust, python, go and java) and has support for a bunch key management systems (GCP, AWS and Hashicorp)

stavros 68 days ago [-]

Can someone explain what use the AD is, if we have to decrypt the message to authenticate the AD? If I'm decrypting the message already just to authenticate it, why wouldn't I encrypt the AD as well?

tptacek 68 days ago [-]

Because you need it outside the context of encryption/decryption.

https://news.ycombinator.com/item?id=43827342

Honestly, the classic "message routing" example most things give for AEAD is not very useful. Context binding is a much better primer for intuition.

stavros 68 days ago [-]

EDIT: I saw your other message, and this makes it click for me:

> authenticated data can include data that doesn't even appear in the message, but rather is derived from the context at the time the message is encrypted and decrypted

So the AD can be an additional envelope-level thing at encryption/decryption time, that helps a lot, thanks!

tptacek 68 days ago [-]

This is why message routing headers are kind of a fucky example (you can make it make sense but it begs for this confusion).

stavros 68 days ago [-]

firesteelrain 68 days ago [-]

tptacek 68 days ago [-]

It's a really good question, because, in order to verify the AD, you have to have the same key you need to decrypt it.

stavros 68 days ago [-]

edoceo 68 days ago [-]

Does that make it like some kind of HMAC?

hxtk 68 days ago [-]

Yes, in fact, one construction of the AEAD primitive is to use AES-CTR with HMAC to "bolt on" authentication after the fact (AES-CTR on its own is an unauthenticated stream cipher).

You can find an implementation of AES-CTR-HMAC (at a high level where AES-CTR and HMAC are both given) here: https://github.com/tink-crypto/tink-go/blob/main/aead/aesctr...

andrewflnr 68 days ago [-]

> The AD ensures that the decryption will explode if you try to cut and paste chunks of the file into different positions.

vlovich123 68 days ago [-]

The strategy I used instead was to HKDF derive different keys for each chunk using the offset as part of the info to derive the key. No AD needed.

tatersolid 67 days ago [-]

Two calls to SHA-256 for each block would be very slow compared with a modern AEAD.

vlovich123 67 days ago [-]

tatersolid 67 days ago [-]

It might be “fast enough for line rate” in your situation but even then you could be saving CPU cycles for other work by using a more efficient construction.

tptacek 67 days ago [-]

What does it mean for an AEAD construction to "run at line speed"? Over how many different sessions and with what message sizes and on what hardware?

vlovich123 67 days ago [-]

tptacek 67 days ago [-]

It's irrelevant to your use case. It's not irrelevant to the broader CS question of why people don't just do what you're doing.

vlovich123 67 days ago [-]

tptacek 67 days ago [-]

You'd have to publish details of the construction you came up with for me to have anything to say about it not having or needing a nonce.

dadrian 67 days ago [-]

tptacek 67 days ago [-]

rendaw 68 days ago [-]

IIUC you don't have to decrypt the message - the outermost primitive is the authentication. The point I got is that both the encrypted data and unencrypted data are authenticated.

senderista 68 days ago [-]

cakoose 68 days ago [-]

Yup, the example doesn't make sense for the reason you pointed out.

You could water down the example a bit to make it work:

1. Assume there's some other authentication mechanism for client-server communication, e.g. TLS.

2. The client sends the user ID unencrypted (within TLS) so the server can route, but encrypts the message contents so the server can't read it.

3. The final recipient can validate the message and the user ID.

This saves the client from having to send the user ID twice, once in the ciphertext and once in the clear.

But another more interesting use case is when you don't even send the associated data: https://news.ycombinator.com/item?id=43827342

hxtk 68 days ago [-]

Suppose you have a server with its own encryption key, and a key-value database full of encrypted data (secured by this root keyset) associated with a user.

vlovich123 68 days ago [-]

But why bother putting the user id in the AD instead of part of the authenticated encrypted payload?

hxtk 68 days ago [-]

Let C[] be the cipher text, K[] be the key stream, B be Bob's ID, and E be Eve's ID:

C[:8] = B ^ K[:8]

C[:8] ^ B = K[:8]

C' = C[:8] ^ B ^ E ++ C[8:]

C' would decrypt, validate as "belonging" to Eve, and contain Bob's data.

vlovich123 68 days ago [-]

hxtk 68 days ago [-]

vlovich123 68 days ago [-]

> I would consider AES-CBC+HMAC to be an AEAD construction

OK fair.

> and I would consider whatever "secret key" you pass into HMAC along with the ciphertext to be the "associated data"

My broader point is that I have yet to encounter a use-case for a non-empty AD string.

tptacek 68 days ago [-]

vlovich123 68 days ago [-]

Or alternatively, as I said, derive a new key for each chunk where the offset is part of the info used to derive the key and have an empty AD. Same effect.

hxtk 68 days ago [-]

vlovich123 67 days ago [-]

tptacek 67 days ago [-]

tptacek 68 days ago [-]

I may be missing a subtlety here but it seems like you've essentially reinvented Associated Data but with an extra KDF extraction and an AES key expansion for every chunk.

68 days ago [-]

wofo 68 days ago [-]

It looks like I actually got the example wrong, sorry about that!

hxtk 68 days ago [-]

wofo 68 days ago [-]

Thanks for the example! It has helped me understand better the use case of AEAD for at-rest-encrypted-data.

twic 68 days ago [-]

tptacek 68 days ago [-]

coppsilgold 68 days ago [-]

An AEAD can also be made de novo. Such as AEGIS[1], which performs encryption and authentication in one pass (much like the sponges, but much more performant).

[1] <https://competitions.cr.yp.to/round3/aegisv11.pdf>

syncsynchalt 68 days ago [-]

Even if you combine the operations so that it works, it may not be obvious whether you've opened yourself to side channels like timing attacks.

A naive implementation of the AEAD feature list could trivially allow you to guess the AD for a ciphertext if the AD validation is checked too early in the process.

kazinator 68 days ago [-]

We can imagine, e.g. in the context of e-mail, if the DKIM header signature were combined a PGP-encrypted body as one operation. I'm ducking under the table now, though.

tptacek 68 days ago [-]

kazinator 67 days ago [-]

I looked through that RFC 5XXX that describes AEAD. I somehow feel that there is nothing earth-shattering in it that would confuse cryptographers and crypto coders, if we sent it back to 1993.

tptacek 67 days ago [-]

andrekandre 68 days ago [-]

i really appreciate how this article was written

just the right length and pacing to get me to the end and the point across

wofo 68 days ago [-]

Thanks for the kind words! I'm trying to balance pragmatism with depth. Glad it was useful to you ;)

andrekandre 67 days ago [-]

yes, thank you for writing it, i really learned something!

dlenski 67 days ago [-]

Look, it's a great article, but the perfect title was right there.

> What's my AEAD again, what's my AEAD again?

halosghost 68 days ago [-]

All the best,

-HG