Provision and manage SSH certificates

At Lilik we are currently ~~killing all~~ most of the unicorns using Ansible. We have gone from “it works? Then leave it alone!” to “you can read the playbook and figure it out”.

This is a huge improvement for our infrastructure, right now we migrated LDAP, Dokuwiki, Mattermost, Gogs and the web servers to Ansible, what’s left is the mailserver, mailing list and webmail (working with LDAP is hell).

Perhaps the best enhancement is the LXC provisioning. Right now we use the lxc_container module from Ansible to create our vms and it’s slowly reducing the unique snowflakes and satisfying our OCD.

This is all good and I would talk everyone into using a tool to provision and configure vm but it’s not the greatest benefit we recieved. What I would like to point out is that working with Ansible you can enforce policy like using a certification authority to issue each server a server certificate and to users a client certificate.

SSH certificates

If you are using SSH to log in a remote server there are two available method, user/password authentication and public key authentication. The latter is the more secure method and if your remote server is exposed to the internet you should use it.

But what if you share this remote server with other users?

Everyone logs in as root?

A system user for everyone should be ok but we need an admin, who has access to root?

A mix of the previous?

For us the solution is to use user certificates, a stricter solution to the authentication problem.

A preamble: a key pair is a couple (D_K,E_K) of decription, encription keys.

With public key authentication the server use the user public key E_K to create a challenge, if the client answers correctly then the server grants access. This requires a public key for every user on every server and even if we are few compared to most organization out there this is not feasible. Using user public keys is a distribution problem, I think it has been solved before but there are still question about what to sync, when to sync, when a key expire and how it’s revoked.

With user certificates the process has an additional step, every user creates a private, public key pair (D_K,E_k) and request a user certificate C_U to a certification authority (aka CA).

This certificate is a transformation of the user public key E_K, basically it get signed by the CA with the CA private key.

ssh-keygen -s ca_key /path/to/user/key
# sign the id@fry public key with the certificate in ~/.ssh/ca
ssh-keygen -s ~/.ssh/ca ~/.ssh/edoput@fry.pub

So now we have two public/private key couples, (D_K,E_k) for the user and (D_J,E_J) for the certification authority.

On every server we store the CA public key E_J and every user recieves a certificate C_U that gets placed into the ssh keys folder.

This is what my key folder looks like

# in .ssh/
edoput@fry
edoput@fry.pub
edoput@fry_cert

When a client wants to authenticate to a server it exposes the public key E_K and the certificate C_U, the server can check the CA signature with the key E_J and if it’s valid grant access.

This procedure let us log on every server provisioned with the CA public key without using a password.

Moreover every certificate can embed which user are able to log as and the expire date.

ssh-keygen -s ca_key -V validity_interval -z serial_number -I key_id /path/to/user/key
# sign the id@fry public key with the certificate in ~/.ssh/ca
# set the validity period to 52 week from signature and set the
# certificate serial number to 1 and set available login as edoput and root
ssh-keygen -s ~/.ssh/ca -V "+52w" -z 1 -I edoput,root ~/.ssh/edoput@fry.pub

But there’s more, we can revoke a user certificate if needed and hence reducing the impact surface from insider going rogue.

Providing a key identifier can be very effective at discriminating multiple users using the same login id.

Take a look at this log from /var/auth.log

Jun 16 17:24:55 localhost sshd[22317]: pam_unix(sshd:session): session opened for user root by edoput

my identifier (edoput) is clearly visible in the login session even when I logged in as root.

But it gets better! Trusting the CA on your client really helps when dealing with Ansible.

Every time Ansbile connects to a server it checks the hash of the server key, moving your container to a different ip, changing the hostname or recreating it trigger the DEFCON 1 alarm with the resulting message:

@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Someone could be eavesdropping on you right now (man-in-the-middle attack)!

but if you are trusting the CA then you can configure every server to expose it’s certificate and all is done. The process is exactly the same as in the public key auth buth this time the one providing the certificate is the server.

I would like to conclude this with the image of us drinking something on a beach without worries because our server are now secure but please remember that a CA is not something to misuse or abuse, and it’s not simple to operate, that’s why we had a log discussion and we still are prone to do the BIG FUCKUP^TM.

Common knowledge: The best way is to have a “root” CA that delegates to intermediates CA that sign everything, SSH certificates, SSL certificates, VPN certificates and such. This extra step provides a revokable CA, an extra layer of security in case of the BIG FUCKUP^TM.

But even with these intermediates CA you still have to distribute certificates to every server (how can Ansible distribute anything on a machine that does not have a certificate?), create the user certificates and the process to sign server certificates should be automatic but with confirmation, there are lots of problems to be solved but for now I will gladly enjoy my new freedom to login on every server without having to enter a password.