Today was a very frustrating day.
So, yesterday, I did a rollup of software on my main work machine. I performed an
apt-get upgrade as I have done a thousand times before. Logged off, and went to bed.
This morning, when tried to log on, after I’d entered my details on the login screen, I was greeted by a blank screen for about 2 minutes, before being kicked back to the login screen.
This kind of thing had happened before, and in the past it was just a matter of installing the vendor NVidia drivers for my card. Sometime back in the day the distro provided nvidia drivers had stopped working, so using the vendor ones was the way to go (this has since changed).
No joy. So, I began diving in and pulling at the various ends in an attempt to unravel this knotted ball of string.
Watching the logs, I noticed that just before the login process got thrown back to lightdm, I got a bunch of…
Activated service 'org.freedesktop.systemd1' failed: Process org.freedestop.systemd1 exited with status 1
…appearing in my syslog.
So, something was suddenly up with systemd, but I was nonthewiser.
My setup at home is that I have a server which has my home directory and users exported by NFS/NIS to various machines, so there was nothing actually on the work machine. Sod it I thought, nuke the site from orbit. So, I reinstalled, just in case I had bawked something up over the years.
The fresh install made me create a new user, fine. I installed all the graphics drivers, and was able to log in just fine. Great! So, installed the various bits of software, set up NIS/NFS, could log in on console… great! Logged in through gdm3… aaaand. Nothing. Same error. Switched to lightdm. Same thing.
But… the local user worked. Must be something in my user’s home dir, after all that. So, unmounted my home directory, and tried to log in as a fresh user… still no joy. But the local user could log in…
As a hunch, I copied the user line from my server’s /etc/passwd into my local machine’s /etc/passwd… and bingo, I was able to log in.
So, what looks like has happened is that a recent change (within the last week or so) has broken NIS user support for systemd/dbus. So, when the window manager was trying to start the services it needed to run, it wasn’t able to, since the user it was attempting to use couldn’t be found. Lightdm/Pam still functioned with NIS, so my thinking is that there’s something about the environment that’s looking directly at /etc/passwd for something, or to validate uids.. I’m not an expert.
So, if any of you are in a similar situation, hopefully this blog post will stop you from losing an entire day of work!
My askubuntu ticket is over here, and I’ll keep updated should I find a better solution than this rather crufty hack.