DANE Monitoring
- 19 min read - Text OnlyI recently documented how I have my SMTP (and HTTPS) server rotate their certificates with Certbot and publish the public-key signatures with DANE (via DNS with TLSA records).
After I introduce some of the technologies involved, I'll go into detail on how I am staying on top of my setup to ensure that SMTP servers that send me mail do not bounce because of bad configuration on my part.
Certbot
So Certbot is a neat tool by the Electronic Frontier Foundation to increase adoption of HTTPS certificates from Certificate Authorities like Lets Encrypt, it integrates with nginx, apache, you name it. Certbot uses RFC8555 to acquire certificates through Lets Encrypt. I've developed my own ACME integration before as mentioned in Review of TLS Mastery by Michael W Lucas - Part 1, so I'm very familiar to what's going on from beginning to end.
But certbot is a tool focused on HTTPS servers, When it comes to DANE, browsers don't fully support it. I believe it will continue to be a chicken and the egg problem for a while. Without the browsers supporting it, Certbot does not have the incentive to develop this functionality.
If you search around, certbot + postfix + dovecot seems to be implemented in various ways: some through nginx and others like mine where the mail server references the PEM file on disk in postfix's main.cf
.
This post will not go into my actual setup for postfix + certbot however.
Suffice to say, certbot is only acquiring the certificate and the rest of my script reloads the services at the right time for DANE.
DANE - DNS-Based Authentication of Named Entities
RFC 6394 Use Cases and Requirements for DNS-Based Authentication of Named Entities provides a compelling justification for the DANE feature set.
Today, an attacker can successfully authenticate as a given application service domain if he can obtain a "mis-issued" certificate from one of the widely used CAs -- a certificate containing the victim application service's domain name and a public key whose corresponding private key is held by the attacker.
Recall DigiNotar. "Its whole reason for existence was to tell internet users who and what they could trust-and in 2011, it failed spectacularly in that mission." - slate.com
Turns out RFC6394 started April 2011, the DigiNotar breach started July 10, 2011 and detected July 19th. Interesting that this threat was being considered just as reality asserted itself.
The goal of technologies for DNS-based Authentication of Named Entities (DANE) is to use the DNS and DNSSEC to provide additional information about the cryptographic credentials associated with a domain, so that clients can use this information to increase the level of assurance they receive from the TLS handshake process.
Providing trust anchor material in this way clearly requires DNSSEC, since corrupted or injected records could be used by an attacker to cause clients to trust an attacker's certificate.
The primary focus ... is the enhancement of TLS authentication procedures using the DNS. The general effect of such mechanisms is to increase the role of DNS operators in authentication processes, either in place of or in addition to traditional third-party actors such as commercial certificate authorities.
Trust Anchor
In case you haven't heard of the term trust anchor before, it is an authority that a client trusts and can verify signatures for. Typically operating systems will ship a collection of trust anchors with each release and may modify it with updates.
If you browse or inspect the certificate presented on a valid domain, you should find that the top certificate authority is in your trust anchors.
DNSSEC
Like IPV6 being slowly adopted (it was released in 1998!), DNSSEC too is being slowly deployed since its release in 1999. First it required the Top Level Domains (or TLDs) such as .com
to set up a key. This finally began after 2010. Even so, according to DNSSEC Stats, .com
only has a 2.5%
adoption rate of DNSSEC.
But what is it? It provides a part of Public-Key infrastructure like X.509. Specifically authentication of data and integrity of data.
Instead of a convoluted process involving humans and certificate authorities that want you to add Norton Secured in the footer of your website claiming to have higher conversion rates, DNSSEC is mostly free, not too complicated to plug your DNS provider's (such as cloudflare) key into the registrar.
Really, setting it up is easy. On Cloudflare I just scrolled down in the DNS section, enabled DNSSEC...
And then copied the value onto my registrar's DNSSEC page.
Now a client can verify through DNSSEC that the DNS response from their favorite resolver (e.g. 1.1.1.1) did not manipulate the values I set on my trusted DNS Name Server. So the client can authenticate that the NameServer (NS) linked by my registrar truly provided the DNS records that a client receives from a DNS resolver.
If you're interested in the finer details of it all with pretty pictures, check out Cloudflare's How DNSSEC works marketing page.
DANE Monitoring
After my last post, Viktor followed up with this prompt and I totally agreed.
Okay, so how do I know that my DANE setup is working now and how do I learn if it breaks because the certificate no longer matches what is in TLSA
records as per RFC6698?
A proper health check here would include:
- Fetch the configuration a client would use, i.e. use DNS!
- As a client: connect to the service, compare or assert the presented certificate or public key to the configuration found in DNS.
- Report if any failures are found!
I chose to go with a cron job written in bash. For now it is only executed hourly, but I could do every minute if I do something to only announce once during an interval (say 1 hour).
Fetching configuration
$ dig _25._tcp.mail.cendyne.dev TLSA +short
3 1 1 5E791D0C29D6919D74C365047F4076DED8BBB158E641F816F16B5E57C31D996F
Note:There's a space in the real result, but it is omitted for display purposes.
Client Connect and assert configuration
It turns out that OpenSSL s_client
actually can do this for us.
Adapted below is what I found at Verify TLSA (DANE) records using OpenSSL by Zbyszek Zolkiewski. But we have to do some work to get the results out as you will soon see.
openssl s_client -starttls smtp \
-connect mail.cendyne.dev:25 \
-dane_tlsa_domain mail.cendyne.dev \
-dane_tlsa_rrdata "3 1 1 5E791D0C29D6919D74C365047F4076DED8BBB158E641F816F16B5E57C31D996F" \
</dev/null 2>/dev/null
The output is thus:
CONNECTED(00000003)
depth=0 CN = mail.cendyne.dev
verify return:1
---
Certificate chain
0 s:CN = mail.cendyne.dev
i:C = US, O = Let's Encrypt, CN = R3
1 s:C = US, O = Let's Encrypt, CN = R3
i:C = US, O = Internet Security Research Group, CN = ISRG Root X1
2 s:C = US, O = Internet Security Research Group, CN = ISRG Root X1
i:O = Digital Signature Trust Co., CN = DST Root CA X3
---
Server certificate
-----BEGIN CERTIFICATE-----
...
-----END CERTIFICATE-----
subject=CN = mail.cendyne.dev
issuer=C = US, O = Let's Encrypt, CN = R3
---
No client certificate CA names sent
Peer signing digest: SHA256
Peer signature type: RSA-PSS
Server Temp Key: X25519, 253 bits
---
SSL handshake has read 4792 bytes and written 421 bytes
Verification: OK
Verified peername: mail.cendyne.dev
DANE TLSA 3 1 1 ...e641f816f16b5e57c31d996f matched EE certificate at depth 0
---
New, TLSv1.3, Cipher is TLS_AES_256_GCM_SHA384
Server public key is 2048 bit
Secure Renegotiation IS NOT supported
Compression: NONE
Expansion: NONE
No ALPN negotiated
Early data was not sent
Verify return code: 0 (ok)
---
The key line we are looking for is Verify return code: 0 (ok)
.
If we use a test case like
openssl s_client -starttls smtp \
-connect mail.cendyne.dev:25 \
-dane_tlsa_domain mail.cendyne.dev \
-dane_tlsa_rrdata "3 1 1 9999999999999999999999999999999999999999999999999999999999999999" \
</dev/null 2>/dev/null | grep "Verify return code" | head -n 1
Verify return code: 65 (No matching DANE TLSA records)
So we have a clear way to differentiate a successful state from an error.
Reporting failures
First, because I'm using cron, it will by default email any output to the user assigned to run the script. So some simple echos will do. In this case the script is executed by root
and I have root
aliased to cendyne
on my mail server. I could run it with another user but this is my own code to verify my own infrastructure.
Second, I'm not always looking at my email. In fact Telegram is a great way to get my attention. For that I made a telegram bot and fashioned a quick curl
call to send me a message.
TELEGRAM_CHAT_ID=1234567890
TELEGRAM_BOT_TOKEN=111111111:AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
MSG="There's a \"problem\""
json=$(jq -n --arg chat "$TELEGRAM_CHAT_ID" --arg msg "$1" '{"chat_id":$chat,"text":$msg,"parse_mode":"MarkdownV2"}')
curl -X POST "https://api.telegram.org/bot$TELEGRAM_BOT_TOKEN/sendMessage" \
-H "Content-Type: application/json" \
--data "$json" 2>/dev/null >/dev/null
Testing
First, as a test case I set up some bogus records on another port (not actually in use).
On TCP Port 29, a DANE capable client would expect to see one of the public key signatures when it connects to the service on port 29 with TCP. My tests actually connected to port 25 but I didn't want to pollute my actual records. While my tests actually ran against DNS on TCP port 29, the written examples below are rewritten to say port 25.
Valid but messy
$ dig _25._tcp.mail.cendyne.dev TLSA +short
3 1 1 5E791D0C29D6919D74C365047F4076DED8BBB158E641F816F16B5E57C31D996F
3 1 1 9999999999999999999999999999999999999999999999999999999999999999
3 1 1 8888888888888888888888888888888888888888888888888888888888888888
$ openssl s_client -starttls smtp \
-connect mail.cendyne.dev:25 \
-dane_tlsa_domain mail.cendyne.dev \
-dane_tlsa_rrdata "3 1 1 5E791D0C29D6919D74C365047F4076DED8BBB158E641F816F16B5E57C31D996F" \
-dane_tlsa_rrdata "3 1 1 9999999999999999999999999999999999999999999999999999999999999999" \
-dane_tlsa_rrdata "3 1 1 8888888888888888888888888888888888888888888888888888888888888888" \
</dev/null 2>/dev/null | grep "Verify return code" | head -n 1
Verify return code: 0 (ok)
The 8...8
and 9...9
signatures are my invalid / old / long since replaced test cases. However the 5E...6F
public key is the active and correct key. A client would still validate this but it isn't what I expect. Due to its lower priority, an email-only option is sufficient for me. At the end of the script, it will count the lines and report so.
Invalid
$ dig _25._tcp.mail.cendyne.dev TLSA +short
3 1 1 9999999999999999999999999999999999999999999999999999999999999999
3 1 1 8888888888888888888888888888888888888888888888888888888888888888
$ openssl s_client -starttls smtp \
-connect mail.cendyne.dev:25 \
-dane_tlsa_domain mail.cendyne.dev \
-dane_tlsa_rrdata "3 1 1 9999999999999999999999999999999999999999999999999999999999999999" \
-dane_tlsa_rrdata "3 1 1 8888888888888888888888888888888888888888888888888888888888888888" \
</dev/null 2>/dev/null | grep "Verify return code" | head -n 1
Verify return code: 65 (No matching DANE TLSA records)
No keys in this list are in use, so I expect an email and a telegram message.
Valid, key rotation in progress
$ dig _25._tcp.mail.cendyne.dev TLSA +short
3 1 1 5E791D0C29D6919D74C365047F4076DED8BBB158E641F816F16B5E57C31D996F
3 1 1 8888888888888888888888888888888888888888888888888888888888888888
$ openssl s_client -starttls smtp \
-connect mail.cendyne.dev:25 \
-dane_tlsa_domain mail.cendyne.dev \
-dane_tlsa_rrdata "3 1 1 5E791D0C29D6919D74C365047F4076DED8BBB158E641F816F16B5E57C31D996F" \
-dane_tlsa_rrdata "3 1 1 8888888888888888888888888888888888888888888888888888888888888888" \
</dev/null 2>/dev/null | grep "Verify return code" | head -n 1
Verify return code: 0 (ok)
In this case, the 5E...6F
key is being introduced and 8...8
is being rotated out. This is an accepted test case where no output is expected.
See A sensible "3 1 1" + "3 1 1" key rotation approach by Viktor Dukhovni.
Valid, the usual
$ dig _25._tcp.mail.cendyne.dev TLSA +short
3 1 1 5E791D0C29D6919D74C365047F4076DED8BBB158E641F816F16B5E57C31D996F
$ openssl s_client -starttls smtp \
-connect mail.cendyne.dev:25 \
-dane_tlsa_domain mail.cendyne.dev \
-dane_tlsa_rrdata "3 1 1 5E791D0C29D6919D74C365047F4076DED8BBB158E641F816F16B5E57C31D996F" \
</dev/null 2>/dev/null | grep "Verify return code" | head -n 1
Verify return code: 0 (ok)
Only one record for the Domain-issued certificate, no key rotation in progress, no output is expected.
Now about that extra space
As hinted above, the dig response was edited for display purposes. But it actually looks like
$ dig _25._tcp.mail.cendyne.dev TLSA +short
3 1 1 5E791D0C29D6919D74C365047F4076DED8BBB158E641F816F16B5E57 C31D996F
The OpenSSL command will accept it as is. But I didn't like that being sent to me over email and Telegram.
The solution I found was in Bash script, using dig & curl, for reporting DNS and a few HTTPS policy files for everything email about a domain - line 836 by Phil Pennock.
local -r tlsa_pat='\bTLSA[[:space:]]+([[:digit:]]+)[[:space:]]+([[:digit:]]+)[[:space:]]+([[:digit:]]+)[[:space:]]+(.+)$'
local tlsa_usage tlsa_selector tlsa_matching tlsa_cadata
if [[ "$line" =~ $tlsa_pat ]]; then
tlsa_usage="${BASH_REMATCH[1]}"
tlsa_selector="${BASH_REMATCH[2]}"
tlsa_matching="${BASH_REMATCH[3]}"
tlsa_cadata="${BASH_REMATCH[4]}"
tlsa_cadata="${tlsa_cadata// /}"
# ...
fi
Apparently bash can do regex matching, can put them into capture groups, and then run regex replace on those too with some of that interpolation syntax.... And remain readable. Your opinion may differ.
Simulated Failure
When I used my fake port 29 records, I receive a message on telegram like so:
And in email I receive the following slightly modified for the simulation:
Verify return code: 65 (No matching DANE TLSA records)
SMTP NOT MATCH FAILED, please review!
; <<>> DiG 9.16.1-Ubuntu <<>> tlsa _25._tcp.mail.cendyne.dev
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 777
;; flags: qr rd ra ad; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;_25._tcp.mail.cendyne.dev. IN TLSA
;; ANSWER SECTION:
_25._tcp.mail.cendyne.dev. 270 IN TLSA 3 1 1 99999999999999999999999999999999999999999999999999999999 99999999
_25._tcp.mail.cendyne.dev. 270 IN TLSA 3 1 1 88888888888888888888888888888888888888888888888888888888 88888888
;; Query time: 3 msec
;; SERVER: 8.8.8.8#53(8.8.8.8)
;; WHEN: Sun Aug 15 06:35:49 BST 2021
;; MSG SIZE rcvd: 148
versus
3 1 1 5E791D0C29D6919D74C365047F4076DED8BBB158E641F816F16B5E57C31D996F
Fantastic!
Checking production for real
I receive no email, I receive no telegram message.
Monitoring code
So what's my script? See below, secrets redacted.
#!/bin/bash
HOST=mail.cendyne.dev
TELEGRAM_CHAT_ID=REDACTED
TELEGRAM_BOT_TOKEN=REDACTED
getSig() {
sig=$(openssl x509 -pubkey -noout | openssl base64 -d | openssl dgst -sha256 -binary | xxd -u -p -c 100000)
echo "3 1 1 $sig"
}
getHttpsSig() {
echo | openssl s_client -connect "$1:443" -servername "$1" -showcerts 2>/dev/null | getSig
}
getSmtpSig() {
echo | openssl s_client -connect "$1:25" -starttls smtp -showcerts 2>/dev/null | getSig
}
verifySmtpSig() {
local -n args=$2
result=$(openssl s_client -starttls smtp -connect "$1:25" -dane_tlsa_domain "$1" "${args[@]}" </dev/null 2>/dev/null \
| grep "Verify return code" | head -n 1)
if [[ "$result" == "Verify return code: 0 (ok)" ]]; then
return 0
else
echo "$result"
return 1
fi
}
verifyHttpsSig() {
local -n args=$2
result=$(openssl s_client -connect "$1:443" -servername "$1" -dane_tlsa_domain "$1" "${args[@]}" </dev/null 2>/dev/null \
| grep "Verify return code" | head -n 1)
if [[ "$result" == "Verify return code: 0 (ok)" ]]; then
return 0
else
echo "$result"
return 1
fi
}
reformatDigTLSA() {
local -r tlsa_pat='([[:digit:]]+)[[:space:]]+([[:digit:]]+)[[:space:]]+([[:digit:]]+)[[:space:]]+(.+)$'
while read -r line; do
if [[ "$line" =~ $tlsa_pat ]]; then
tlsa_usage="${BASH_REMATCH[1]}"
tlsa_selector="${BASH_REMATCH[2]}"
tlsa_matching="${BASH_REMATCH[3]}"
tlsa_cadata="${BASH_REMATCH[4]}"
tlsa_cadata="${tlsa_cadata// /}"
echo "$tlsa_usage $tlsa_selector $tlsa_matching $tlsa_cadata"
fi
done
}
getTLSA() {
dig tlsa "_$2._tcp.$1" +short | reformatDigTLSA
}
sendTelegramMessage() {
MSG=$(echo -e "$1")
json=$(jq -n --arg chat "$TELEGRAM_CHAT_ID" --arg msg "$MSG" '{"chat_id":$chat,"text":$msg,"parse_mode":"MarkdownV2"}')
curl -X POST "https://api.telegram.org/bot$TELEGRAM_BOT_TOKEN/sendMessage" \
-H "Content-Type: application/json" \
--data "$json" 2>/dev/null >/dev/null
}
TLSA_25=$(getTLSA "$HOST" 25)
TLSA_443=$(getTLSA "$HOST" 443)
function verify() {
local command="$1"
local lines="$2"
local -a DANE_PARAMS
OLD_IFS="$IFS"
IFS=$'\n'
for line in $lines; do
DANE_PARAMS+=(-dane_tlsa_rrdata)
DANE_PARAMS+=("$line")
done
IFS="$OLD_IFS"
verification=$($command "$HOST" DANE_PARAMS)
STATUS=$?
if [[ $STATUS != 0 ]]; then
echo "$verification"
fi
return $STATUS
}
VERIFY_ERROR=false
verify verifySmtpSig "$TLSA_25"
if [[ $? != 0 ]]; then
SMTP=$(getSmtpSig "$HOST")
echo "SMTP NOT MATCH FAILED, please review!"
dig tlsa "_25._tcp.$HOST"
echo -e "versus\n$SMTP"
sendTelegramMessage "SMTP TLSA Record does not match the current server, currently it sees:
\`\`\`\n$TLSA_25\n\`\`\`
But I expect to see \`$SMTP\`"
VERIFY_ERROR=true
fi
verify verifyHttpsSig "$TLSA_443"
if [[ $? != 0 ]]; then
HTTPS=$(getHttpsSig "$HOST")
echo "HTTPS MATCH_FAILED, please review!"
dig tlsa "_443._tcp.$HOST"
echo -e "versus\n$HTTPS"
sendTelegramMessage "HTTPS TLSA Record does not match the current server, currently it sees:
\`\`\`\n$TLSA_443\n\`\`\`
But I expect to see \`$HTTPS\`"
VERIFY_ERROR=true
fi
if [[ "$VERIFY_ERROR" == "true" ]]; then
exit 1
fi
lines=$(echo "$TLSA_25" | wc -l)
if (( "$lines" < 1 || "$lines" > 2 )); then
echo -e "Warning, SMTP lines count $lines is not within [1,2]\n$TLSA_25"
fi
lines=$(echo "$TLSA_443" | wc -l)
if (( "$lines" < 1 || "$lines" > 2 )); then
echo -e "Warning, HTTPS lines count $lines is not within [1,2]\n$TLSA_443"
fi
Github gist of the aboveConclusion
I am glad that Viktor prompted me to add monitoring to my deployment. That way Viktor doesn't need to email me anymore! I'll know first and that's how it should be.
But also, I learned quite a bit while doing this, first a few more useful things that OpenSSL s_client
can offer. Second, I learned about processing input in bash.
Anyway, now I'll know when things are broken! As long as my server is running.