SDC’s big theory

Just copy-pasting a high-level description of new SDC scheduling class taken from corresponding PSARC discussion to ease future reference.

/*
 * The System Duty Cycle (SDC) scheduling class
 * --------------------------------------------
 *
 * Background
 *
 * Kernel threads in Solaris have traditionally not been large consumers
 * of CPU time.  They typically wake up, perform a small amount of
 * work, then go back to sleep waiting for either a timeout or another
 * signal.  On the assumption that the small amount of work that they do
 * is important for the behavior of the whole system, these threads are
 * treated kindly by the dispatcher and the SYS scheduling class: they run
 * without preemption from anything other than real-time and interrupt
 * threads; when preempted, they are put at the front of the queue, so they
 * generally do not migrate between CPUs; and they are allowed to stay
 * running until they voluntarily give up the CPU.
 *
 * As Solaris has evolved, new workloads have emerged which require the
 * kernel to perform significant amounts of CPU-intensive work.  One
 * example of such a workload is ZFS's transaction group sync processing.
 * Each sync operation generates a large batch of I/Os, and each I/O
 * may need to be compressed and/or checksummed before it is written to
 * storage.  The taskq threads which perform the compression and checksums
 * will run nonstop as long as they have work to do; a large sync operation
 * on a compression-heavy dataset can keep them busy for seconds on end.
 * This causes human-time-scale dispatch latency bubbles for any other
 * threads which have the misfortune to share a CPU with the taskq threads.
 *
 * The SDC scheduling class is a solution to this problem.
 *
 *
 * Overview
 *
 * SDC is centered around the concept of a thread's duty cycle (DC):
 *
 *			      ONPROC time
 *	Duty Cycle =	----------------------
 *			ONPROC + Runnable time
 *
 * This is the ratio of the time that the thread spent running on a CPU
 * divided by the time it spent running or trying to run.  It is unaffected
 * by any time the thread spent sleeping, stopped, etc.
 *
 * A thread joining the SDC class specifies a "target" DC that it wants
 * to run at.  To implement this policy, the routine sysdc_update() scans
 * the list of active SDC threads every few ticks and uses each thread's
 * microstate data to compute the actual duty cycle that that thread
 * has experienced recently.  If the thread is under its target DC, its
 * priority is increased to the maximum available (sysdc_maxpri, which is
 * 99 by default).  If the thread is over its target DC, its priority is
 * reduced to the minimum available (sysdc_minpri, 0 by default).  This
 * is a fairly primitive approach, in that it doesn't use any of the
 * intermediate priorities, but it's not completely inappropriate.  Even
 * though threads in the SDC class might take a while to do their job, they
 * are by some definition important if they're running inside the kernel,
 * so it is reasonable that they should get to run at priority 99.
 *
 * If a thread is running when sysdc_update() calculates its actual duty
 * cycle, and there are other threads of equal or greater priority on its
 * CPU's dispatch queue, sysdc_update() preempts that thread.  The thread
 * acknowledges the preemption by calling sysdc_preempt(), which calls
 * setbackdq(), which gives other threads with the same priority a chance
 * to run.  This creates a de facto time quantum for threads in the SDC
 * scheduling class.
 *
 * An SDC thread which is assigned priority 0 can continue to run if
 * nothing else needs to use the CPU that it's running on.  Similarly, an
 * SDC thread at priority 99 might not get to run as much as it wants to
 * if there are other priority-99 or higher threads on its CPU.  These
 * situations would cause the thread to get ahead of or behind its target
 * DC; the longer the situations lasted, the further ahead or behind the
 * thread would get.  Rather than condemning a thread to a lifetime of
 * paying for its youthful indiscretions, SDC keeps "base" values for
 * ONPROC and Runnable times in each thread's sysdc data, and updates these
 * values periodically.  The duty cycle is then computed using the elapsed
 * amount of ONPROC and Runnable times since those base times.
 *
 * Since sysdc_update() scans SDC threads fairly frequently, it tries to
 * keep the list of "active" threads small by pruning out threads which
 * have been asleep for a brief time.  They are not pruned immediately upon
 * going to sleep, since some threads may bounce back and forth between
 * sleeping and being runnable.
 *
 *
 * Interfaces
 *
 * void sysdc_thread_enter(t, dc, flags)
 *
 *	Moves a kernel thread from the SYS scheduling class to the
 *	SDC class. t must have an associated LWP (created by calling
 *	lwp_kernel_create()).  The thread will have a target DC of dc.
 *	Flags should be either 0 or SYSDC_THREAD_BATCH.  If
 *	SYSDC_THREAD_BATCH is specified, the thread will run with a
 *	slightly lower priority (see "Batch threads", below).
 *
Posted on December 27, 2009 at 9:58 am by sergeyt · Permalink
In: Solaris

Leave a Reply