tl:dr – Kernel headers for 5.3.0 have changed so modules for older Nvidia cards no longer build. Downgrade to the last known good kernel (e.g. 5.0.0-37-generic) and you should be good.

So, I woke up this morning (blues riff), extra early in order to bash out some client work before heading to the gym. I turned on my computer and was greeted by the sight of my login screen, low res, and only on one monitor.

Nothing with computers, it seems, is going to be easy.

I remembered that I had done an apt-get upgrade the previous night, and this has in the past knocked the Nvidia drivers out of whack, so I reinstalled the drivers for my card apt-get remove nvidia-driver-390; apt-get install nvidia-driver-390 and restarted.

No joy.

Fine, I’ll install the official drivers from the Nvidia binary. Never failed before.

Bang. Wouldn’t build.

Ok. Time to dig a little deeper.

I went back to the distro drivers, and this time removed them completely; apt-get remove nvidia-driver-390 --purge; apt-get auto remove; apt-get install nvidia-driver-390.

Still no joy.

However, picking apart the build logs, and we have our first clue. A bunch of build errors to do with the Nvidia modules. Seems that the drivers were not able to build against the current kernel.

Looking at my /boot/ I can see that a new kernel (5.3.0) was installed as part of the upgrade. It looks like the kernel headers have been changed about, and this has broken older drivers.

So, there are two possible solutions – use a newer version of the Nvidia drivers (which isn’t possible for me since I have a pretty old GForce card installed), or roll back to the previous kernel.

First, remove the 5.3.0 kernel: apt-get remove linux-image-5.3.0-26; apt-get remove linux-headers-5.3.0-26

You’ll get a warning if you’re currently running this kernel, don’t worry, we’ll sort that out now.

Make sure you’ve got the working kernel installed apt-get install linux-image-5.0.0-37 linux-headers-5.0.0-37

Now, make sure grub boots this module. Edit /etc/default/grub and change GRUB_DEFAULT to GRUB_DEFAULT="Advanced options for Ubuntu>Ubuntu, with Linux 5.0.0-37-generic" (or whatever the particulars of your last working kernel was – ls -larth /boot to show the history).

Update your boot, update-grub

Reboot, and then log in to the console, and reinstall your Nvidia drivers: apt-get remove nvidia-driver-390 --purge; apt-get auto remove; apt-get install nvidia-driver-390

All being well, the Nvidia drivers should build this time. Reboot one last time, and you’re good!

Hope this helps.

Today was a very frustrating day.

So, yesterday, I did a rollup of software on my main work machine. I performed an apt-get upgrade as I have done a thousand times before. Logged off, and went to bed.

This morning, when tried to log on, after I’d entered my details on the login screen, I was greeted by a blank screen for about 2 minutes, before being kicked back to the login screen.

Hmm..

This kind of thing had happened before, and in the past it was just a matter of installing the vendor NVidia drivers for my card. Sometime back in the day the distro provided nvidia drivers had stopped working, so using the vendor ones was the way to go (this has since changed).

No joy. So, I began diving in and pulling at the various ends in an attempt to unravel this knotted ball of string.

Watching the logs, I noticed that just before the login process got thrown back to lightdm, I got a bunch of…

…appearing in my syslog.

So, something was suddenly up with systemd, but I was nonthewiser.

My setup at home is that I have a server which has my home directory and users exported by NFS/NIS to various machines, so there was nothing actually on the work machine. Sod it I thought, nuke the site from orbit. So, I reinstalled, just in case I had bawked something up over the years.

The fresh install made me create a new user, fine. I installed all the graphics drivers, and was able to log in just fine. Great! So, installed the various bits of software, set up NIS/NFS, could log in on console… great! Logged in through gdm3… aaaand. Nothing. Same error. Switched to lightdm. Same thing.

But… the local user worked. Must be something in my user’s home dir, after all that. So, unmounted my home directory, and tried to log in as a fresh user… still no joy. But the local user could log in…

Hmmm….

Lightbulb!

As a hunch, I copied the user line from my server’s /etc/passwd into my local machine’s /etc/passwd… and bingo, I was able to log in.

So, what looks like has happened is that a recent change (within the last week or so) has broken NIS user support for systemd/dbus. So, when the window manager was trying to start the services it needed to run, it wasn’t able to, since the user it was attempting to use couldn’t be found. Lightdm/Pam still functioned with NIS, so my thinking is that there’s something about the environment that’s looking directly at /etc/passwd for something, or to validate uids.. I’m not an expert.

So, if any of you are in a similar situation, hopefully this blog post will stop you from losing an entire day of work!

My askubuntu ticket is over here, and I’ll keep updated should I find a better solution than this rather crufty hack.

I got a little bit of time recently, so I’ve updated the Known vagrant build.

The latest vagrant package now uses the latest Ubuntu LTS – Xenial – as it’s build. It also makes some changes to the environment, including, we now use mariadb instead of the now rather defunct (and in some cases broken) Oracle mysql server. The build also now runs PHP 7.0.

If you’re like me, you might need to update your version of vagrant and VirtualBox in order to successfully boot the newer Ubuntu images, so be sure you do that.

» Visit the project on Github...