Managing Device Drivers

ubuysa

The BSOD Doctor
Driver management is an aspect of Windows support that seems to cause a fair few problems. That may be because the function and operation of drivers is not well understood, or because driver support seems to be some sort of 'dark art'.

Driver problems are the most common cause of BSODs, yet a driver-caused BSOD can be tricky to diagnose. It's often not possible for example, to point with any confidence at the driver that actually caused the BSOD. In addition, in the vast majority of driver-caused BSODs, you need to be fairly adept at using the Windows debugging tools, and have a good understanding of the internals of I/O operation in Windows and its related control-block structures, to accurately identify the failing driver - and that of course assumes you have a kernel dump file to analyse in the first place.

That's not to say that all driver problems result in a BSOD, driver management would be much simpler if they did! Driver problems can also cause system crashes, hangs, black screens, and of course a myriad of niggly issues with the device(s) they manage.

Because of the above it's probably worthwhile spending some time talking about what drivers are, about what they do, and about why driver failures often result in a BSOD. We should also spend some time talking about what you can do to reduce the risk of a driver-caused BSOD (or indeed any driver-caused failure) and what simple and logical steps you can take to identify the failing driver when you do get problems.

Drivers are an integral part of the Windows I/O Subsystem, so perhaps the first thing we should do is define what we mean by I/O. I/O stands for input/output and everything that goes on in your PC outside of the CPU and RAM is I/O. When we talk about 'input' we mean input into the CPU/RAM, and when we talk about 'output' we mean output from the CPU/RAM. Without I/O capability your PC would be a useless box. The keyboard and mouse are I/O devices, the monitor is an I/O device, even your disk drives are I/O devices.

Every I/O device need a driver to manage it, sometimes these 'device drivers' are part of the Windows I/O subsystem (the basic mouse and keyboard drivers for example), sometimes they are drivers written and provided by Microsoft (the CD/DVD drivers for example), and quite often they are provided by the vendor that created the device (motherboard drivers for example).


How I/O Works

Drivers handle most of the processing involved in doing all I/O operations, so it's well worth looking at a simple overview of how an I/O operation is handled by Windows to see where drivers fit in. As an example we'll take a simple read operation from a file on a disk, initiated by an ordinary user application (this explanation has been greatly simplified)...

Our user application's view of the file it's using is as a sequential list of records that exist 'somewhere', and the application now wants record number 237 (for example). It thus allocates a buffer in virtual storage to hold the record and issues a read request for record 237 (in the specified file) and this is passed to the I/O Manager in the Windows kernel.

The I/O Manager does some basic error checking to make sure the I/O request is valid and complete, and it then passes the I/O request to the appropriate driver for the device on which the specified file resides (a disk drive in this case). At this point the originating application's thread is placed in a wait state, waiting on a specific event - the completion of this I/O operation.

The device driver for the disk (running in kernel mode) does some more checking of the request using its intimate knowledge of the device (like ensuring that the allocated buffer is big enough to hold the record, for example) and then translates the application's record 237 into an actual data block in a specific track and sector on a specific disk. If the required disk is free (ie. not in the middle of another I/O operation) the driver communicates with the disk using the appropriate disk hardware ports, registers, and commands to instruct it to read the required data block from the specified track and sector. At this point the device driver exits and a new thread is dispatched on this CPU.

A hard disk would now move the read/write heads to the required track and wait for the wanted sector to rotate under them (these are the seek and latency times of hard disks, this is why they are so slow). An SSD would just electronically switch to the appropriate block, a process that's very fast...

As the wanted data block passes under the read/write heads (or is electronically selected on an SSD) the disk controller copies the data off the disk surface (or from the SSD cells) and into the application's buffer (using Direct Memory Access - DMA). When that's done the disk controller raises an interrupt. Interrupts are hardware signals that cause a CPU to stop executing the current thread (its status is saved) and switch the CPU to begin executing the interrupt routine in the device driver for the device that raised the interrupt. In our example this will be the disk device driver (the same driver as earlier).

The interrupt routine in the disk device driver checks with the disk controller that the data has been located and copied and then 'posts' the wait event, in other words it signals that the event the application thread is waiting on has completed. The application's thread is now marked as ready and will go on a CPU ready queue to be dispatched. The device driver now exits and the I/O is complete.

When the application's thread is next dispatched on a CPU the contents of record 237 are now magically present in the buffer and the application can begin to process it.

There are a couple of key things to note from the above...

1. Drivers run in kernel mode. (Some simple drivers, like printer and scanner drivers, can run in user mode).

2. The driver (and the device) do all the heavy lifting in an I/O operation.

The first of these observations, drivers running in kernel mode, is a huge deal because in kernel mode you can execute privileged CPU instructions, access data and code in any address space, and potentially modify the kernel itself. A misbehaving, or a malicious, driver could cause untold damage to your system or hide hard to find malware (keyloggers and the like). In addition, the ability of Windows to recover from misbehaving kernel code is limited, kernel code is supposed to behave itself and obey all the rules, so often the only option Windows has with misbehaving kernel code is to BSOD the system.

The second of these observations, drivers (and the devices) doing all the heavy lifting, means that it is absolutely vital that the driver code installed is designed specifically for the exact device it is managing. Using the wrong driver that doesn't fully understand how to manage the device is going to end in tears (or more likely a BSOD). In addition, many drivers are not written by Microsoft, they are written by the hardware vendors themselves (usually in C and C++) so the quality of the coding can't always be guaranteed. We saw in point 1. above that it's also essential that the driver contains only the code necessary to manage the device and no other 'suspect' code, that's hard to ensure. And remember drivers are kernel code.

It's also worth noting from the above that this I/O was synchronous because the originating application was placed in a wait whilst the I/O was done. Most I/O operations in Windows are synchronous, but Windows does support asynchronous I/O. This is where the application is not placed in a wait state and can continue executing, starting additional I/O operations without waiting for the first to complete. This means the originating application has to check to see whether its I/Os have completed and handle any synchronisation necessary between them. Applications using asynchronous I/O are much more difficult to write of course and they are way more difficult to debug too!


Other Driver Functions

Many drivers can also be used to manage the device; change buffer sizes, turn features on and off, etc. In these cases the management (rather than the I/O) function of a driver is called directly, either from a user application or by a specific management application (and sometimes by Windows applications).

Sometimes the driver itself modifies its behaviour at I/O time, based on the either on the application that's called it or special parameters passed by the calling application. This is done by invoking 'filters' that are part of the driver code itself, either before or after the main driver code is called. Graphics drivers commonly make use of filters to enhance the performance (or user experience) in specific games.
 
Last edited:

ubuysa

The BSOD Doctor
How Drivers Are Installed

When you download a driver you generally execute some sort of setup.exe file that installs the driver for you, but the driver installation itself consists of four main types of file, and for this example let's call the device 'dongle'...

dongle.sys - this is the actual driver code, and for some drivers there are many .sys files included. This is because many drivers are layered and control passes from one layer to the other during an I/O operation. Open Device Manager, right-click on a device (disk, CDROM, monitor, etc), select Properties, click the Driver tab and then click the Driver Details button. Typically what you'll see is one of more .sys files; these are the drivers themselves. If you navigate to C:\Windows\System32\Drivers you'll see all the driver .sys files stored here, this is where they are loaded from at boot time. Below is the driver for my DVD/CDROM device...

cdrom driver.jpg

dongle.inf - the .inf files contain 'instructions' that indicate how the driver should be installed; what registry settings should be created, which device options to activate, etc. If you navigate to C:\Windows\System32\DriverStore\FileRepository you'll see folders for each of your drivers, they use system names but you can figure out what device many of them are for. Open any of these folders and you'll see the .sys files (there could be more than one, as mentioned) and the .inf file. The .inf files are text files and you can open them with notepad, so open the .inf file you see. The contents won't make any sense but you can see how they are designed to configure the installation of the driver. Below is the start of the .inf file for basicdisplay.inf...

inf file.jpg

dongle.cat - the catalog files contains the digital signatures for all the files in the driver package. A digital signature on a driver certifies that the driver is original and not tampered with, since they run in kernel mode this is an important safeguard. It's so important that in 64-bit Windows 10 (1607 onward) in a UEFI SecureBoot system (and that's almost all of us) only drivers digitally signed by the Windows Hardware Development Center (WHDC) will be loaded. That all but guarantees that the drivers you are running are safe (and properly tested).

dongle.dll - some user mode drivers include .dll files that are dynamically loaded as needed, just like any other .dll file.


How To Prevent Driver-Caused BSODs

There are two things that you can (and should) do to reduce (or even eliminate) driver-caused BSODs (or indeed any driver-caused failure); ensure that the drivers you install come from trusted sources (driver-signing helps a lot with this), and leave working drivers alone.

The first of these, obtain your drivers from trusted sources, has always been of vital importance because drivers run in kernel mode. This is one reason why the use of third-party driver 'search and install' tools is strongly discouraged. You simply don't know where the driver code has come from nor whether it's been tampered with. Installing an unverified driver from an unknown source is like putting a loaded gun to your head. The digital signing of drivers by Microsoft helps ensure that your drivers are original and safe.

I don't know how Microsoft assemble the drivers that they make available via Windows Update, I do know that Windows Update in Windows 10 generally finds the best driver for almost all platforms. It seems highly likely to me that every driver digitally signed by the WHDC will also be made available via Windows Update, which means that Windows Update should be able to install every driver that Windows 10 will load.

When looking for the best driver to install, after a reinstallation of Windows for example, my recommended search order would be...

1. Windows Update. Especially if you're running Windows 10.

2. The vendor who supplied (or built) your PC/laptop. Some drivers (mostly for laptops) can only be obtained from the original equipment manufacturer (OEM), the Clevo Control Center (Hotkey) for example.

3. The vendor who built the specific item of hardware (the motherboard vendor, for example).

I would not consider installing any driver from anywhere else, even on a pre-Windows 10 system.

The second, leaving working drivers alone goes against the grain for many users. Users have become used to always installing all Windows updates, because they fix security vulnerabilities or because they fix code bugs. This is very important and in Windows 10 is pretty much enforced.

The same rule does not apply to drivers however. Because they are 'trusted' kernel mode code (and in Windows 10, digitally signed) there are no security vulnerabilities that need to be patched. Similarly, as long as the driver is doing the job it was designed for, there are no bugs to be fixed. Driver code that you have had installed for years is pretty much guaranteed to be bug free - in that time you must have exercised every possible option the driver supports and it's working just fine. Good. Leave it alone then. Bug-free code is valuable.

Hardware vendors (and Microsoft) will update drivers to cater for newly introduced hardware features or components, or to cater for new software features that are provided for some platforms. If you don't have those hardware or software features then you don't need the updated driver. Installing the updated driver will not get you any more functionality nor faster processing of your I/Os, because you don't have the hardware or software the driver update is for, but it will introduce less well tested code into your kernel. We've seen that old driver code that you've executed billions of times is almost certainly bug-free, by installing the latest driver (that you don't actually need) you've introduced new driver code that you've never executed before and which may contain bugs.

Updating drivers when you don't need to makes your system potentially less reliable.

There are only two valid reasons for updating drivers...

You are having a problem with a device. If you do subsequently install the software feature that the latest driver update is designed for then you may well run into issues with the device. In that case updating to the latest driver is a sensible first step, it may well cure your problem. Even if it doesn't cure your problem you know that you now have the latest driver code installed and so something else must be the cause of your problem.

If you are already at the latest driver version and you're having problems, then do consider downgrading your driver to a version or two earlier. You might lose a little functionality that was introduced by the latest driver, but you also may eliminate the bug that seems to be in the latest driver. You will do no damage by downgrading your drivers when troubleshooting, so don't be afraid to try.

The updated driver introduces features that you need. I can't stress the 'that you need' qualifier enough. As mentioned, it is better for reliability reasons to leave drivers alone if possible, so be sure that you really do need the new feature before updating the driver. Graphics drivers quite often introduce new functionality aimed at improving the performance or the look and feel of certain games, if you play those games then you need those updates, otherwise leave the driver alone.

Here's another reason why you should avoid third-party driver search and install tools; drivers don't need to be regularly updated and are actually more stable the less you update them. Driver search and install tools thus serve no useful purpose (and especially in Windows 10 where Windows Update can install most drivers).

To see the drivers installed on your system, and whether they are started or not, enter 'msinfo32' in the Run command box. In the System Information window that appears, expand the Software Environment section and click on System Drivers. Below is an example from my system...

sysinfo.jpg


Windows Update And Drivers

As mentioned, Windows 10 seems to perform extremely well at selecting the best drivers for most hardware on most systems and is the preferred install route for drivers. Unfortunately on Windows 10 systems the installation of all updates is pretty much enforced. In general this is a good thing, Windows updates are installed to prevent you from having a security vulnerability exploited, or to prevent you being affected by a bug in the Windows code. With drivers however we've seen that as long as they are working we really don't want to update them.

You can prevent Windows Update from automatically updating drivers, and my personal recommendation is that you should. To do this enter 'sysdm.cpl' in the Run command box, this will display the System Properties dialog. Click the Hardware tab and then the 'Device Installation Settings' button. Select the 'No' radio button and Windows Update will no longer download driver updates. Note that this also means that Windows will not be able to automatically find the best drivers for your devices, so you'll have to find the right drivers manually. Below are the System properties dialog and the Device Installation Settings dialog side by side...


updates.jpg
 
Last edited:

ubuysa

The BSOD Doctor
What To Do If A Driver Fails

Most BSODs are driver related, so in the absence of any other information assume that every BSOD is a driver failure. Most system crashes and hangs are also driver related, and in the absence of any information to the contrary you won't go far wrong if you assume that any major system failure is driver related.

If your system does BSOD and you're experienced at analysing Windows dump files then the dump might point you at the failing module. Sadly, and all too often, the failing module is ntoskrnl.exe - that's not a driver it's the Windows kernel itself. Quite often when a driver fails the failure isn't identified until the Windows kernel code gets control and realises that the driver has tried to do something illegal (such as referencing paged out memory when the CPU is running at an elevated interrupt request level - this is the common IRQL_NOT_LESS_OR_EQUAL 0xA BSOD). Since the kernel isn't really able to recover from kernel mode errors it BSODs with ntoskrnl.exe as the failing module, and that's not a lot of help.

Probably the best way to tell that it's a) a driver failure, and b) a third-party driver failure, is to boot Windows into Safe Mode where it loads a minimal set of Microsoft drivers. That's a little bit more fiddly in Windows 10 than in earlier versions of Windows but there are lots of websites that will show you the various ways to boot into Safe Mode. If the system doesn't BSOD in Safe Mode (or you don't get the failure in Safe Mode) then it's almost certain to be a third-party driver problem.

At this point I always try and ask myself 'what changed?'. If something was working and it's not working now, then something has changed. It may not always be something that appears to be directly related to the BSOD, you might simply have plugged your webcam into a different port for example, or changed the buffer size on a network adapter. And you might have made the change several minutes (or even hours) before the BSOD (or other failure). Don't assume that because a change doesn't appear to be related that it can't be the cause - suspect everything that you've changed, especially recent Windows Updates.

If your system works in Safe Mode then unplug all external hardware, except the mouse, keyboard, and one monitor, and try the system in that state. If it works then reconnect devices one at a time until you find the one that makes it fail. Only then see whether there is an updated driver for that device.

Ultimately, if you are not able to identify a specific failing device, then it makes sense to update all your drivers to the latest version as a first step. The potential reliability hit you'll take is outweighed by the possibility of fixing the failure/BSOD.

There is an excellent example of how to troubleshoot a driver-caused BSOD at https://www.pcspecialist.co.uk/forums/showthread.php?60261-BSOD-s-following-windows-update.


Driver Verifier

Ever since Windows 2000 there has been a hidden driver tool contained within Windows called Driver Verifier. It's remained hidden for a simple and very good reason; Driver Verifier is NOT a tool designed for end users. Driver Verifier is a tool designed for the use of Microsoft developers, driver developers, and system administrators (sysadmins). To understand why it's not an end user tool we need to explain what the Driver Verifier tool is for and how it works.

Driver Verifier is a stress testing tool for drivers, it's designed to make drivers fail and thus BSOD. Its purpose is to allow driver developers (and sysadmins) to stress drivers in every way possible to make them fail under test rather than in normal operation. Driver Verifier does not produce any output to tell you which drivers failed which tests, it's only function is to make the system BSOD if the tests fail.

To find out why a driver caused a BSOD you need to be able to analyse the dump that's produced, and that's well beyond the capabilities of most end users. Driver Verifier thus doesn't help end users much, all it can really do is show you that your drivers don't BSOD and are thus good. However, every driver that the Microsoft WHDC digitally signs must already have passed all appropriate Driver Verifier tests so all drivers in Windows 10 are already known to be good!

There is a second reason why end users should avoid running Driver Verifier; it introduces some serious performance degradation as it stresses the drivers. The more drivers you ask it to test simultaneously the greater the performance impact. Driver Verifier is designed to be run in test environments where the sole object of the system is to develop and stress test drivers and where the performance degradation is acceptable.

The third reason why end users should avoid Driver Verifier is that you need to understand what each driver test is designed to do, if you don't know what the tests are for then how can you know which ones to select? In addition, you get no indication that it's running, and the settings are preserved across reboots (even power off cold boots) so the only way to turn it off is manually.

The fourth, and most serious, reason why end users should avoid Driver Verifier is that if Driver Verifier causes a driver to BSOD immediately it's loaded (and that's very possible) then every time you reboot you get a BSOD! The only way out of this cycle is to wait for boot recovery to allow you to boot into Safe Mode (and that's not that straightforward either) and turn Driver Verifier off in there. Of course, if you're incredibly unlucky and Driver Verifier should BSOD a driver that is loaded in Safe Mode you may be looking at a reinstallation of Windows.

For all these reasons it's not sensible or appropriate I think, to talk about how to run Driver Verifier here. Should you want to see it you start a command prompt and enter the command 'verifier', and the Driver Verifier dialog will start. The most important radio button on that first page of the dialog is 'Delete Existing Settings' - this is how you turn Driver Verifier off.

Driver Verifier can also be setup and run directly from the command line; at a command prompt enter the command 'verifier /?' and you'll see the (long) list of commands that allow you to setup and stop Driver Verifier. The most important of these is the 'verifier /reset' command - this is the command line way to turn Driver Verifier off.

Note that regardless of whether you use the GUI dialog or the command line to setup Driver Verifier, you must reboot in order to start the Driver Verifier tests running. Similarly, whichever method you use to turn Driver Verifier off, you must reboot to stop it running.


Driver Verifier Tips

1. Don't run it. I'm serious, it buys the end user almost nothing, the performance impact can be huge, and you may get unexpected BSODs.

2. Select only the Standard tests if you must run it, the Advanced tests are designed for specific drivers. Unless you know exactly what each test is doing don't select the 'Create Custom Settings' option.

3. Select the 'Automatically select unsigned drivers' option, all signed drivers have already been exhaustively tested with Driver Verifier.

4. Only select the 'Automatically select drivers built for older versions of Windows' if you know you have any, but be prepared for BSODs if you do.

5. Never select 'Automatically select all drivers installed on this computer', the performance impact will be massive.

6. Only choose 'Select drivers names from a list' if you are intimately familiar with the drivers you have and their names.

7. Practice booting into safe mode and be sure you know how to turn Driver Verifier off both from the GUI dialog and from a command prompt.

8. Don't run it. Umm, I think I've already said that...
 
Last edited:
Top