DANE Monitoring

Published Aug 15, 2021 - 19 min read - Text Only

Table of contents

Certbot
DANE - DNS-Based Authentication of Named Entities
- Trust Anchor
- DNSSEC
DANE Monitoring
Monitoring code
Conclusion

I recently documented how I have my SMTP (and HTTPS) server rotate their certificates with Certbot and publish the public-key signatures with DANE (via DNS with TLSA records).

After I introduce some of the technologies involved, I'll go into detail on how I am staying on top of my setup to ensure that SMTP servers that send me mail do not bounce because of bad configuration on my part.

Certbot

So Certbot is a neat tool by the Electronic Frontier Foundation to increase adoption of HTTPS certificates from Certificate Authorities like Lets Encrypt, it integrates with nginx, apache, you name it. Certbot uses RFC8555 to acquire certificates through Lets Encrypt. I've developed my own ACME integration before as mentioned in Review of TLS Mastery by Michael W Lucas - Part 1, so I'm very familiar to what's going on from beginning to end.

Certbot

But certbot is a tool focused on HTTPS servers, When it comes to DANE, browsers don't fully support it. I believe it will continue to be a chicken and the egg problem for a while. Without the browsers supporting it, Certbot does not have the incentive to develop this functionality.

If you search around, certbot + postfix + dovecot seems to be implemented in various ways: some through nginx and others like mine where the mail server references the PEM file on disk in postfix's main.cf.

This post will not go into my actual setup for postfix + certbot however.

Suffice to say, certbot is only acquiring the certificate and the rest of my script reloads the services at the right time for DANE.

DANE - DNS-Based Authentication of Named Entities

RFC 6394 Use Cases and Requirements for DNS-Based Authentication of Named Entities provides a compelling justification for the DANE feature set.

Today, an attacker can successfully authenticate as a given application service domain if he can obtain a "mis-issued" certificate from one of the widely used CAs -- a certificate containing the victim application service's domain name and a public key whose corresponding private key is held by the attacker.

Recall DigiNotar. "Its whole reason for existence was to tell internet users who and what they could trust-and in 2011, it failed spectacularly in that mission." - slate.com

Turns out RFC6394 started April 2011, the DigiNotar breach started July 10, 2011 and detected July 19th. Interesting that this threat was being considered just as reality asserted itself.

The goal of technologies for DNS-based Authentication of Named Entities (DANE) is to use the DNS and DNSSEC to provide additional information about the cryptographic credentials associated with a domain, so that clients can use this information to increase the level of assurance they receive from the TLS handshake process.

In other words, this is a way for a client to double check the presented certificate, which is valid under established Public-Key Infrastructure (X.509) (PKIX) against DNS which the owner controls.

Providing trust anchor material in this way clearly requires DNSSEC, since corrupted or injected records could be used by an attacker to cause clients to trust an attacker's certificate.

Without DNSSEC, DANE could also be used as a denial of service vector. Though an improper setup would also produce a denial of service. That's what this post is about!

The primary focus ... is the enhancement of TLS authentication procedures using the DNS. The general effect of such mechanisms is to increase the role of DNS operators in authentication processes, either in place of or in addition to traditional third-party actors such as commercial certificate authorities.

Another neat thing proposed in RFC6394 is the possibility that DANE is the trust anchor where the site operator self signs their own certificates or runs their own certificate authority. But I don't think anyone has actually deployed this successfully.

Trust Anchor

In case you haven't heard of the term trust anchor before, it is an authority that a client trusts and can verify signatures for. Typically operating systems will ship a collection of trust anchors with each release and may modify it with updates.

Mac OS KeyChain trusted certificates

Mozilla ships its own certificate bundle with firefox instead of relying on the system for up to date certificates. This same certificate can be downloaded for bundling in CI or as a cron job to keep a system automatically up to date on certificates without OS Support.

If you browse or inspect the certificate presented on a valid domain, you should find that the top certificate authority is in your trust anchors.

Google.com's Certificate inspection modal

DNSSEC

Like IPV6 being slowly adopted (it was released in 1998!), DNSSEC too is being slowly deployed since its release in 1999. First it required the Top Level Domains (or TLDs) such as .com to set up a key. This finally began after 2010. Even so, according to DNSSEC Stats, .com only has a 2.5% adoption rate of DNSSEC.

But what is it? It provides a part of Public-Key infrastructure like X.509. Specifically authentication of data and integrity of data.

The ISO has made this standard (or the latest version) unavailable to the public without payment. I consider the ISO to be a detriment to society with their regressive participation in standards forming and adoption.

Instead of a convoluted process involving humans and certificate authorities that want you to add Norton Secured in the footer of your website claiming to have higher conversion rates, DNSSEC is mostly free, not too complicated to plug your DNS provider's (such as cloudflare) key into the registrar.

Yes my employer actually went through this norton thing and I was the one to add it to our frontend and to get the purchased certificate onto our load balancer. No, it did not improve our conversion rates.

Really, setting it up is easy. On Cloudflare I just scrolled down in the DNS section, enabled DNSSEC...

Cloudflare's DNSSEC settings

And then copied the value onto my registrar's DNSSEC page.

Registrar DNSSEC settings

Now a client can verify through DNSSEC that the DNS response from their favorite resolver (e.g. 1.1.1.1) did not manipulate the values I set on my trusted DNS Name Server. So the client can authenticate that the NameServer (NS) linked by my registrar truly provided the DNS records that a client receives from a DNS resolver.

Another neat thing is the absence of DNS records can be signed. So a client can know for certain that the resolver is not obscuring records. This is an aspect of integrity that DNSSEC provides.

If the NS lies about what I put in, then that's bad. It is also out of scope of DNSSEC.

If you're interested in the finer details of it all with pretty pictures, check out Cloudflare's How DNSSEC works marketing page.

DANE Monitoring

After my last post, Viktor followed up with this prompt and I totally agreed.

Viktor suggests adding monitoring

Okay, so how do I know that my DANE setup is working now and how do I learn if it breaks because the certificate no longer matches what is in TLSA records as per RFC6698?

A proper health check here would include:

Fetch the configuration a client would use, i.e. use DNS!
As a client: connect to the service, compare or assert the presented certificate or public key to the configuration found in DNS.
Report if any failures are found!

I chose to go with a cron job written in bash. For now it is only executed hourly, but I could do every minute if I do something to only announce once during an interval (say 1 hour).

Fetching configuration

$ dig _25._tcp.mail.cendyne.dev TLSA +short
3 1 1 5E791D0C29D6919D74C365047F4076DED8BBB158E641F816F16B5E57C31D996F

Note:
There's a space in the real result, but it is omitted for display purposes.

There are various TLSA record configurations, in particular I am using 3 1 1 which means Domain-issued certificate - SubjectPublicKeyInfo - SHA-256. Some also use 3 0 1 which instead of the public key, it hashes the entire certificate presented. See RFC6698 section 7.2, 7.3, and 7.4. My setup could be adjusted to use 3 0 1 instead.

Client Connect and assert configuration

It turns out that OpenSSL s_client actually can do this for us.

This was not included or mentioned or even in the toolset introduced in the book TLS Mastery by Michael W Lucas. I reviewed the first half in Review of TLS Mastery by Michael W Lucas - Part 1 and found it quite disappointing.

Adapted below is what I found at Verify TLSA (DANE) records using OpenSSL by Zbyszek Zolkiewski. But we have to do some work to get the results out as you will soon see.

openssl s_client -starttls smtp \
-connect mail.cendyne.dev:25 \
-dane_tlsa_domain mail.cendyne.dev \
-dane_tlsa_rrdata "3 1 1 5E791D0C29D6919D74C365047F4076DED8BBB158E641F816F16B5E57C31D996F" \
</dev/null 2>/dev/null

</dev/null tells s_client that there's no content to send over the connection. While 2>/dev/null pipes STDERR into the void. OpenSSL s_client seems to send the connection data into STDERR while STDOUT is OpenSSL's analysis of the connection.

Unfortunately openssl shipped on Mac OS does not have this capability and will show unknown option -dane_tlsa_domain.

You will need to either use a brew supplied openssl. But, even if you use $(brew --prefix openssl)/bin/openssl ...., you may get the following because of your ISP or firewall. So just do it on the cloud somewhere.

4337712448:error:0200203C:system library:connect:Operation timed out:crypto/bio/b_sock2.c:110:
4337712448:error:2008A067:BIO routines:BIO_connect:connect error:crypto/bio/b_sock2.c:111:
connect:errno=60

The output is thus:

CONNECTED(00000003)
depth=0 CN = mail.cendyne.dev
verify return:1
---
Certificate chain
 0 s:CN = mail.cendyne.dev
   i:C = US, O = Let's Encrypt, CN = R3
 1 s:C = US, O = Let's Encrypt, CN = R3
   i:C = US, O = Internet Security Research Group, CN = ISRG Root X1
 2 s:C = US, O = Internet Security Research Group, CN = ISRG Root X1
   i:O = Digital Signature Trust Co., CN = DST Root CA X3
---
Server certificate
-----BEGIN CERTIFICATE-----
...
-----END CERTIFICATE-----
subject=CN = mail.cendyne.dev

issuer=C = US, O = Let's Encrypt, CN = R3

---
No client certificate CA names sent
Peer signing digest: SHA256
Peer signature type: RSA-PSS
Server Temp Key: X25519, 253 bits
---
SSL handshake has read 4792 bytes and written 421 bytes
Verification: OK
Verified peername: mail.cendyne.dev
DANE TLSA 3 1 1 ...e641f816f16b5e57c31d996f matched EE certificate at depth 0
---
New, TLSv1.3, Cipher is TLS_AES_256_GCM_SHA384
Server public key is 2048 bit
Secure Renegotiation IS NOT supported
Compression: NONE
Expansion: NONE
No ALPN negotiated
Early data was not sent
Verify return code: 0 (ok)
---

The key line we are looking for is Verify return code: 0 (ok).

If we use a test case like

openssl s_client -starttls smtp \
-connect mail.cendyne.dev:25 \
-dane_tlsa_domain mail.cendyne.dev \
-dane_tlsa_rrdata "3 1 1 9999999999999999999999999999999999999999999999999999999999999999" \
</dev/null 2>/dev/null | grep "Verify return code" | head -n 1
Verify return code: 65 (No matching DANE TLSA records)

So we have a clear way to differentiate a successful state from an error.

Because TLSA record sets can have multiple results such as in the case of key rotation, OpenSSL s_client supports multiple -dane_tlsa_rrdata arguments.

This was the first time I've ever dynamically assembled a bash call and I really did not want to use exec.

args+=(first)
var="hello"
args+=("second$var")
echo "${args[@]}"
# outputs
# first secondhello
# but is effectively
# ["first", "secondhello"]

Reporting failures

First, because I'm using cron, it will by default email any output to the user assigned to run the script. So some simple echos will do. In this case the script is executed by root and I have root aliased to cendyne on my mail server. I could run it with another user but this is my own code to verify my own infrastructure.

Second, I'm not always looking at my email. In fact Telegram is a great way to get my attention. For that I made a telegram bot and fashioned a quick curl call to send me a message.

TELEGRAM_CHAT_ID=1234567890
TELEGRAM_BOT_TOKEN=111111111:AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
MSG="There's a \"problem\""
json=$(jq -n --arg chat "$TELEGRAM_CHAT_ID" --arg msg "$1" '{"chat_id":$chat,"text":$msg,"parse_mode":"MarkdownV2"}')
curl -X POST "https://api.telegram.org/bot$TELEGRAM_BOT_TOKEN/sendMessage" \
  -H "Content-Type: application/json" \
  --data "$json" 2>/dev/null >/dev/null

I'm not really a fan of the jq template style, jo is more accessible but it isn't as available or known.

Testing

First, as a test case I set up some bogus records on another port (not actually in use).

Test TLSA records

On TCP Port 29, a DANE capable client would expect to see one of the public key signatures when it connects to the service on port 29 with TCP. My tests actually connected to port 25 but I didn't want to pollute my actual records. While my tests actually ran against DNS on TCP port 29, the written examples below are rewritten to say port 25.

Valid but messy

$ dig _25._tcp.mail.cendyne.dev TLSA +short
3 1 1 5E791D0C29D6919D74C365047F4076DED8BBB158E641F816F16B5E57C31D996F
3 1 1 9999999999999999999999999999999999999999999999999999999999999999
3 1 1 8888888888888888888888888888888888888888888888888888888888888888

$ openssl s_client -starttls smtp \
-connect mail.cendyne.dev:25 \
-dane_tlsa_domain mail.cendyne.dev \
-dane_tlsa_rrdata "3 1 1 5E791D0C29D6919D74C365047F4076DED8BBB158E641F816F16B5E57C31D996F" \
-dane_tlsa_rrdata "3 1 1 9999999999999999999999999999999999999999999999999999999999999999" \
-dane_tlsa_rrdata "3 1 1 8888888888888888888888888888888888888888888888888888888888888888" \
</dev/null 2>/dev/null | grep "Verify return code" | head -n 1
Verify return code: 0 (ok)

The 8...8 and 9...9 signatures are my invalid / old / long since replaced test cases. However the 5E...6F public key is the active and correct key. A client would still validate this but it isn't what I expect. Due to its lower priority, an email-only option is sufficient for me. At the end of the script, it will count the lines and report so.

Invalid

$ dig _25._tcp.mail.cendyne.dev TLSA +short
3 1 1 9999999999999999999999999999999999999999999999999999999999999999
3 1 1 8888888888888888888888888888888888888888888888888888888888888888

$ openssl s_client -starttls smtp \
-connect mail.cendyne.dev:25 \
-dane_tlsa_domain mail.cendyne.dev \
-dane_tlsa_rrdata "3 1 1 9999999999999999999999999999999999999999999999999999999999999999" \
-dane_tlsa_rrdata "3 1 1 8888888888888888888888888888888888888888888888888888888888888888" \
</dev/null 2>/dev/null | grep "Verify return code" | head -n 1
Verify return code: 65 (No matching DANE TLSA records)

No keys in this list are in use, so I expect an email and a telegram message.

Valid, key rotation in progress

$ dig _25._tcp.mail.cendyne.dev TLSA +short
3 1 1 5E791D0C29D6919D74C365047F4076DED8BBB158E641F816F16B5E57C31D996F
3 1 1 8888888888888888888888888888888888888888888888888888888888888888

$ openssl s_client -starttls smtp \
-connect mail.cendyne.dev:25 \
-dane_tlsa_domain mail.cendyne.dev \
-dane_tlsa_rrdata "3 1 1 5E791D0C29D6919D74C365047F4076DED8BBB158E641F816F16B5E57C31D996F" \
-dane_tlsa_rrdata "3 1 1 8888888888888888888888888888888888888888888888888888888888888888" \
</dev/null 2>/dev/null | grep "Verify return code" | head -n 1
Verify return code: 0 (ok)

In this case, the 5E...6F key is being introduced and 8...8 is being rotated out. This is an accepted test case where no output is expected.

See A sensible "3 1 1" + "3 1 1" key rotation approach by Viktor Dukhovni.

Valid, the usual

$ dig _25._tcp.mail.cendyne.dev TLSA +short
3 1 1 5E791D0C29D6919D74C365047F4076DED8BBB158E641F816F16B5E57C31D996F

$ openssl s_client -starttls smtp \
-connect mail.cendyne.dev:25 \
-dane_tlsa_domain mail.cendyne.dev \
-dane_tlsa_rrdata "3 1 1 5E791D0C29D6919D74C365047F4076DED8BBB158E641F816F16B5E57C31D996F" \
</dev/null 2>/dev/null | grep "Verify return code" | head -n 1
Verify return code: 0 (ok)

Only one record for the Domain-issued certificate, no key rotation in progress, no output is expected.

Now about that extra space

As hinted above, the dig response was edited for display purposes. But it actually looks like

$ dig _25._tcp.mail.cendyne.dev TLSA +short
3 1 1 5E791D0C29D6919D74C365047F4076DED8BBB158E641F816F16B5E57 C31D996F

The OpenSSL command will accept it as is. But I didn't like that being sent to me over email and Telegram.

To fix this, I spent over an hour trying sed, awk, and perl. I feel let down by each and every one of them.

The solution I found was in Bash script, using dig & curl, for reporting DNS and a few HTTPS policy files for everything email about a domain - line 836 by Phil Pennock.

local -r tlsa_pat='\bTLSA[[:space:]]+([[:digit:]]+)[[:space:]]+([[:digit:]]+)[[:space:]]+([[:digit:]]+)[[:space:]]+(.+)$'
local tlsa_usage tlsa_selector tlsa_matching tlsa_cadata
if [[ "$line" =~ $tlsa_pat ]]; then
  tlsa_usage="${BASH_REMATCH[1]}"
  tlsa_selector="${BASH_REMATCH[2]}"
  tlsa_matching="${BASH_REMATCH[3]}"
  tlsa_cadata="${BASH_REMATCH[4]}"
  tlsa_cadata="${tlsa_cadata// /}"
  # ...
fi

Apparently bash can do regex matching, can put them into capture groups, and then run regex replace on those too with some of that interpolation syntax.... And remain readable. Your opinion may differ.

Simulated Failure

When I used my fake port 29 records, I receive a message on telegram like so:

Telegram Message

And in email I receive the following slightly modified for the simulation:

Verify return code: 65 (No matching DANE TLSA records)
SMTP NOT MATCH FAILED, please review!

; <<>> DiG 9.16.1-Ubuntu <<>> tlsa _25._tcp.mail.cendyne.dev
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 777
;; flags: qr rd ra ad; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;_25._tcp.mail.cendyne.dev.	IN	TLSA

;; ANSWER SECTION:
_25._tcp.mail.cendyne.dev. 270	IN	TLSA	3 1 1 99999999999999999999999999999999999999999999999999999999 99999999
_25._tcp.mail.cendyne.dev. 270	IN	TLSA	3 1 1 88888888888888888888888888888888888888888888888888888888 88888888

;; Query time: 3 msec
;; SERVER: 8.8.8.8#53(8.8.8.8)
;; WHEN: Sun Aug 15 06:35:49 BST 2021
;; MSG SIZE  rcvd: 148

versus
3 1 1 5E791D0C29D6919D74C365047F4076DED8BBB158E641F816F16B5E57C31D996F

Fantastic!

Checking production for real

I receive no email, I receive no telegram message.

A quiet but accurate alarm is a good alarm.

Monitoring code

So what's my script? See below, secrets redacted.

#!/bin/bash

HOST=mail.cendyne.dev
TELEGRAM_CHAT_ID=REDACTED
TELEGRAM_BOT_TOKEN=REDACTED

getSig() {
sig=$(openssl x509 -pubkey -noout | openssl base64 -d | openssl dgst -sha256 -binary | xxd -u -p -c 100000)
echo "3 1 1 $sig"
}

getHttpsSig() {
echo | openssl s_client -connect "$1:443" -servername "$1" -showcerts 2>/dev/null | getSig
}

getSmtpSig() {
echo | openssl s_client -connect "$1:25" -starttls smtp -showcerts 2>/dev/null | getSig
}

verifySmtpSig() {
local -n args=$2
result=$(openssl s_client -starttls smtp -connect "$1:25" -dane_tlsa_domain "$1" "${args[@]}" </dev/null 2>/dev/null \
| grep "Verify return code" | head -n 1)
if [[ "$result" == "Verify return code: 0 (ok)" ]]; then
return 0
else
echo "$result"
return 1
fi
}

verifyHttpsSig() {
local -n args=$2
result=$(openssl s_client -connect "$1:443" -servername "$1" -dane_tlsa_domain "$1" "${args[@]}" </dev/null 2>/dev/null \
| grep "Verify return code" | head -n 1)
if [[ "$result" == "Verify return code: 0 (ok)" ]]; then
return 0
else
echo "$result"
return 1
fi
}

reformatDigTLSA() {
local -r tlsa_pat='([[:digit:]]+)[[:space:]]+([[:digit:]]+)[[:space:]]+([[:digit:]]+)[[:space:]]+(.+)$'
while read -r line; do
if [[ "$line" =~ $tlsa_pat ]]; then
tlsa_usage="${BASH_REMATCH[1]}"
tlsa_selector="${BASH_REMATCH[2]}"
tlsa_matching="${BASH_REMATCH[3]}"
tlsa_cadata="${BASH_REMATCH[4]}"
tlsa_cadata="${tlsa_cadata// /}"
echo "$tlsa_usage $tlsa_selector $tlsa_matching $tlsa_cadata"
fi
done
}

getTLSA() {
dig tlsa "_$2._tcp.$1" +short | reformatDigTLSA
}


sendTelegramMessage() {
MSG=$(echo -e "$1")
json=$(jq -n --arg chat "$TELEGRAM_CHAT_ID" --arg msg "$MSG" '{"chat_id":$chat,"text":$msg,"parse_mode":"MarkdownV2"}')
curl -X POST "https://api.telegram.org/bot$TELEGRAM_BOT_TOKEN/sendMessage" \
  -H "Content-Type: application/json" \
  --data "$json" 2>/dev/null >/dev/null
}


TLSA_25=$(getTLSA "$HOST" 25)
TLSA_443=$(getTLSA "$HOST" 443)

function verify() {
local command="$1"
local lines="$2"
local -a DANE_PARAMS

OLD_IFS="$IFS"
IFS=$'\n'

for line in $lines; do
DANE_PARAMS+=(-dane_tlsa_rrdata)
DANE_PARAMS+=("$line")
done

IFS="$OLD_IFS"

verification=$($command "$HOST" DANE_PARAMS)
STATUS=$?
if [[ $STATUS != 0 ]]; then
echo "$verification"
fi
return $STATUS
}

VERIFY_ERROR=false
verify verifySmtpSig "$TLSA_25"
if [[ $? != 0 ]]; then
SMTP=$(getSmtpSig "$HOST")
echo "SMTP NOT MATCH FAILED, please review!"
dig tlsa "_25._tcp.$HOST"
echo -e "versus\n$SMTP"
sendTelegramMessage "SMTP TLSA Record does not match the current server, currently it sees:
\`\`\`\n$TLSA_25\n\`\`\`
But I expect to see \`$SMTP\`"
VERIFY_ERROR=true
fi

verify verifyHttpsSig "$TLSA_443"
if [[ $? != 0 ]]; then
HTTPS=$(getHttpsSig "$HOST")
echo "HTTPS MATCH_FAILED, please review!"
dig tlsa "_443._tcp.$HOST"
echo -e "versus\n$HTTPS"
sendTelegramMessage "HTTPS TLSA Record does not match the current server, currently it sees:
\`\`\`\n$TLSA_443\n\`\`\`
But I expect to see \`$HTTPS\`"
VERIFY_ERROR=true
fi

if [[ "$VERIFY_ERROR" == "true" ]]; then
exit 1
fi

lines=$(echo "$TLSA_25" | wc -l)
if (( "$lines" < 1 || "$lines" > 2 )); then
echo -e "Warning, SMTP lines count $lines is not within [1,2]\n$TLSA_25"
fi

lines=$(echo "$TLSA_443" | wc -l)
if (( "$lines" < 1 || "$lines" > 2 )); then
echo -e "Warning, HTTPS lines count $lines is not within [1,2]\n$TLSA_443"
fi

Github gist of the above

Conclusion

I am glad that Viktor prompted me to add monitoring to my deployment. That way Viktor doesn't need to email me anymore! I'll know first and that's how it should be.

But also, I learned quite a bit while doing this, first a few more useful things that OpenSSL s_client can offer. Second, I learned about processing input in bash.

It turns out if you use while in bash, it will create a sub-shell so any variables set within the loop are not preserved. I tried to input the lines and read it line by line to create the dynamic openssl command, but nothing came out. Instead I had to resort to using this Internal Field Separator IFS variable and use a for loop.

Anyway, now I'll know when things are broken! As long as my server is running.