Hangs - User PerspectiveUsers like responsive applications. When they click a menu, they want the application to react instantly, even if it is currently printing their work. When they save a lengthy document in their favorite word processor, they want to continue typing while the disk is still spinning. Users get impatient rather quickly when the application does not react in a timely fashion to their input.
A programmer might recognize many legitimate reasons for an application not to instantly respond to user input. The application might be busy recalculating some data, or simply waiting for its disk I/O to complete. However, from user research, we know that users get annoyed and frustrated after just a couple of seconds of unresponsiveness. After 5 seconds, they will try to terminate a hung application. Next to crashes, application hangs are the most common source of user disruption when working with GUI applications.
There are many different root causes for application hangs, and not all of them manifest themselves in an unresponsive UI. However, an unresponsive UI is one of the most common hang experiences, and this scenario currently receives the most operating system support for both detection as well as recovery. Windows automatically detects, collects debug information, and optionally terminates or restarts hung applications. Otherwise, the user might have to restart the machine in order to recover a hung application.
Hangs - Operating System PerspectiveWhen an application (or more accurately, a thread) creates a window on the desktop, it enters into an implicit contract with the Desktop Window Manager (DWM) to process window messages in a timely fashion. The DWM posts messages (keyboard/mouse input and messages from other windows, as well as itself) into the thread-specific message queue. The thread retrieves and dispatches those messages via its message queue. If the thread does not service the queue by calling GetMessage, messages are not processed, and the window hangs: it can neither redraw nor can it accept input from the user. The operating system detects this state by attaching a timer to pending messages in the message queue. If a message has not been retrieved within 5 seconds, the DWM declares the window to be hung. You can query this particular window state via the IsHungAppWindow API.
Detection is only the first step. At this point, the user still cannot even terminate the application - clicking the X (Close) button would result in a WM_CLOSE message, which would be stuck in the message queue just like any other message. The Desktop Window Manager assists by seamlessly hiding and then replacing the hung window with a 'ghost' copy displaying a bitmap of the original window's previous client area (and adding "Not Responding" to the title bar). As long as the original window's thread does not retrieve messages, the DWM manages both windows simultaneously, but allows the user to interact only with the ghost copy. Using this ghost window, the user can only move, minimize, and - most importantly - close the unresponsive application, but not change its internal state.
The Desktop Window Manager does one last thing; it integrates with Windows Error Reporting, allowing the user to not only close and optionally restart the application, but also send valuable debugging data back to Microsoft. You can get this hang data for your own applications by signing up at the Winqual website.
See also: WER.
Hangs - EurekaLog PerspectiveEurekaLog's hang detection works similarly to system's one. If you enable hang detection - then EurekaLog will create a new thread on startup of your application. This "hang detection" thread will constantly ask UI thread to process a WM_NULL message - this is the message that do nothing. So it can be used for window polling. If an application window is hung, it will not be able to process the WM_NULL message. So, EurekaLog will detect a hang.
Note: operating system does not send WM_NULL messages to your threads. OS doesn't need this, because it already have all information available (information about last sent message and delay times). However, EurekaLog has no access to this information - thus, it must send WM_NULL message to detect hangs.
This technique works only in GUI applications (the same as technique used by operating system) and only for main thread (because GUI in VCL, CLX and FMX applications are restricted to main thread).
However, if your particular application allow some way to detect hangs - you may use RaiseFreezeException function to trigger hang detection. For example, if you spawn a background thread (to offload heavy work and let GUI remain responsive), and if you did not get reply from your background thread in sane amount of time - then you can consider your background thread as hanged, and you can call RaiseFreezeException function to invoke freeze detection dialog.
If your application is running on Vista+ system (e.g. Windows Vista, Windows 7, Windows 8, Windows 8.1, Windows 10, etc.) - then EurekaLog will use Wait Chain Traversal (WCT) API to detect deadlocks between threads. Live locks are not detected.
Once EurekaLog detects hang or deadlock in application - it raises a special constructed exception. This immediately triggers a standard exception processing, which invokes EurekaLog, displays a error dialog, sends report, etc.
Hangs - Developer PerspectiveThe operating system and EurekaLog defines an application hang as a UI thread that has not processed messages for at least 5 seconds (for OS) or 60 seconds (default for EurekaLog). Obvious bugs cause some hangs, for example, a thread waiting for an event that is never signaled, and two threads each holding a lock and trying to acquire the others. You can fix those bugs without too much effort. However, many hangs are not so clear. Yes, the UI thread is not retrieving messages - but it is equally busy doing other 'important' work and will eventually come back to processing messages.
However, the user perceives this as a bug. The design should match the user's expectations. If the application's design leads to an unresponsive application, the design will have to change. Finally, and this is important, unresponsiveness cannot be fixed like a code bug; it requires upfront work during the design phase. Trying to retrofit an application's existing code base to make the UI more responsive is often too expensive. The following design guidelines might help:
Unfortunately, there is no standard simple way to design and write a responsive application. Windows and Delphi do not provide a simple asynchronous framework that would allow for easy scheduling of blocking or long-running operations. The following sections introduce some of the best practices in preventing hangs and highlight some of the common pitfalls. However, there are some 3rd party frameworks and solutions available, which can help you with developing smooth applications. Please look for information about AsyncCalls, TasksEx and OTL.
Best PracticesKeep the UI Thread Simple The UI thread's primary responsibility is to retrieve and dispatch messages. Any other kind of work introduces the risk of hanging the windows owned by this thread.
Do:
Do not:
Implement Asynchronous Patterns Removing long-running or blocking operations from the UI thread requires implementing an asynchronous framework that allows offloading those operations to worker threads.
Do:
Use Locks Wisely Your application or DLL needs locks to synchronize access to its internal data structures. Using multiple locks increases parallelism and makes your application more responsive. However, using multiple locks also increases the chance of acquiring those locks in different orders and causing your threads to deadlock. If two threads each hold a lock and then try to acquire the other thread's lock, their operations will form a circular wait that blocks any forward progress for these threads. You can avoid this deadlock only by ensuring that all threads in the application always acquire all locks in the same order. However, it isn't always easy to acquire locks in the 'right' order. Software components can be composed, but lock acquisitions cannot. If your code calls some other component, that component's locks now become part of your implicit lock order - even if you have no visibility into those locks.
Things get even harder because locking operations include far more than the usual functions for Critical Sections, Mutexes, and other traditional locks. Any blocking call that crosses thread boundaries has synchronization properties that can result in a deadlock. The calling thread performs an operation with 'acquire' semantics and cannot unblock until the target thread 'releases' that call. Quite a few User32 functions (for example SendMessage), as well as many blocking COM calls fall into this category.
Worse yet, the operating system has its own internal process-specific lock that sometimes is held while your code executes. This lock is acquired when DLLs are loaded into the process, and is therefore called the 'loader lock.' The DllMain function always executes under the loader lock; if you acquire any locks in DllMain (and you should not), you need to make the loader lock part of your lock order. Calling certain Win32 APIs might also acquire the loader lock on your behalf - functions like LoadLibraryEx, GetModuleHandle, and especially CoCreateInstance.
Do:
Do not:
Be Careful with Exceptions Exceptions allow the separation of normal program flow and error handling. Because of this separation, it can be difficult to know the precise state of the program prior to the exception and the exception handler might miss crucial steps in restoring a valid state. This is especially true for lock acquisitions that need to be released in the handler to prevent future deadlocks.
Do:
Do not:
This article is based on Preventing Hangs in Windows Applications
|