1 .HTML "Process Sleep and Wakeup on a Shared-memory Multiprocessor
3 Process Sleep and Wakeup on a Shared-memory Multiprocessor
10 rob,presotto,ken,gerard@plan9.bell-labs.com
13 Appeared in a slightly different form in
15 Proceedings of the Spring 1991 EurOpen Conference,
17 Tromsø, Norway, 1991, pp. 161-166.
19 The problem of enabling a `sleeping' process on a shared-memory multiprocessor
20 is a difficult one, especially if the process is to be awakened by an interrupt-time
21 event. We present here the code
22 for sleep and wakeup primitives that we use in our multiprocessor system.
23 The code has been exercised by years of active use and by a verification
27 Our problem is to synchronise processes on a symmetric shared-memory multiprocessor.
28 Processes suspend execution, or
30 while awaiting an enabling event such as an I/O interrupt.
31 When the event occurs, the process is issued a
33 to resume its execution.
34 During these events, other processes may be running and other interrupts
35 occurring on other processors.
37 More specifically, we wish to implement subroutines called
39 callable by a process to relinquish control of its current processor,
42 callable by another process or an interrupt to resume the execution
43 of a suspended process.
44 The calling conventions of these subroutines will remain unspecified
47 We assume the processors have an atomic test-and-set or equivalent
48 operation but no other synchronisation method. Also, we assume interrupts
49 can occur on any processor at any time, except on a processor that has
50 locally inhibited them.
52 The problem is the generalisation to a multiprocessor of a familiar
53 and well-understood uniprocessor problem. It may be reduced to a
54 uniprocessor problem by using a global test-and-set to serialise the
56 which is equivalent to synchronising through a monitor.
57 For performance and cleanliness, however,
58 we prefer to allow the interrupt handling and process control to be multiprocessed.
60 Our attempts to solve the sleep/wakeup problem in Plan 9
63 We implemented solutions several times over several months and each
64 time convinced ourselves \(em wrongly \(em they were correct.
65 Multiprocessor algorithms can be
66 difficult to prove correct by inspection and formal reasoning about them
67 is impractical. We finally developed an algorithm we trust by
68 verifying our code using an
69 empirical testing tool.
70 We present that code here, along with some comments about the process by
71 which it was designed.
75 Since processes in Plan 9 and the UNIX
76 system have similar structure and properties, one might ask if
82 could not easily be adapted from their standard uniprocessor implementation
83 to our multiprocessor needs.
84 The short answer is, no.
89 take as argument a single global address
90 that serves as a unique
91 identifier to connect the wakeup with the appropriate process or processes.
92 This has several inherent disadvantages.
93 From the point of view of
97 it is difficult to associate a data structure with an arbitrary address;
98 the routines are unable to maintain a state variable recording the
99 status of the event and processes.
100 (The reverse is of course easy \(em we could
101 require the address to point to a special data structure \(em
102 but we are investigating
107 not the code that calls them.)
108 Also, multiple processes sleep `on' a given address, so
110 must enable them all, and let process scheduling determine which process
111 actually benefits from the event.
113 a queueing mechanism would be preferable
114 but, again, it is difficult to associate a queue with a general address.
115 Moreover, the lack of state means that
119 cannot know what the corresponding process (or interrupt) is doing;
123 must be executed atomically.
124 On a uniprocessor it suffices to disable interrupts during their
126 On a multiprocessor, however,
128 can inhibit interrupts only on the current processor,
129 so while a process is executing
131 the desired interrupt can come and go on another processor.
132 If the wakeup is to be issued by another process, the problem is even harder.
133 Some inter-process mutual exclusion mechanism must be used,
134 which, yet again, is difficult to do without a way to communicate state.
136 In summary, to be useful on a multiprocessor,
141 must either be made to run atomically on a single
142 processor (such as by using a monitor)
143 or they need a richer model for their communication.
147 Consider the case of an interrupt waking up a sleeping process.
148 (The other case, a process awakening a second process, is easier because
149 atomicity can be achieved using an interlock.)
150 The sleeping process is waiting for some event to occur, which may be
151 modeled by a condition coming true.
152 The condition could be just that the event has happened, or something
153 more subtle such as a queue draining below some low-water mark.
154 We represent the condition by a function of one
157 the code supporting the device generating the interrupts
158 provides such a function to be used by
162 to synchronise. The function returns
164 if the event has not occurred, and
166 some time after the event has occurred.
171 routines must, of course, work correctly if the
172 event occurs while the process is executing
175 We assume that a particular call to
177 corresponds to a particular call to
180 at most one process is asleep waiting for a particular event.
181 This can be guaranteed in the code that calls
185 by appropriate interlocks.
186 We also assume for the moment that there will be only one interrupt
187 and that it may occur at any time, even before
192 we desire that multiple instances of
196 may be running simultaneously on our multiprocessor.
197 For example, a process calling
199 to await a character from an input channel need not
200 wait for another process to finish executing
202 to await a disk block.
203 At a finer level, we would like a process reading from one input channel
204 to be able to execute
206 in parallel with a process reading from another input channel.
207 A standard approach to synchronisation is to interlock the channel `driver'
208 so that only one process may be executing in the channel code at once.
209 This method is clearly inadequate for our purposes; we need
210 fine-grained synchronisation, and in particular to apply
211 interlocks at the level of individual channels rather than at the level
212 of the channel driver.
214 Our approach is to use an object called a
216 which is a data structure through which
221 (The similarly named construct in Ada is a control structure;
222 ours is an unrelated data structure.)
224 is allocated for each active source of events:
225 one for each I/O channel,
226 one for each end of a pipe, and so on.
227 The rendezvous serves as an interlockable structure in which to record
228 the state of the sleeping process, so that
232 can communicate if the event happens before or while
238 is therefore a function
240 void sleep(Rendezvous *r, int (*condition)(void*), void *arg)
242 called by the sleeping process.
249 and is part of the data structure for the (say) device.
257 to decide whether the event has occurred.
259 has a simpler specification:
261 void wakeup(Rendezvous *r).
264 must be called after the condition has become true.
270 data type is defined as
279 are test-and-set spin locks.
282 returns when the current process holds that lock;
286 Here is our implementation of
288 Its details are discussed below.
290 is a pointer to the current process on the current processor.
291 (Its value differs on each processor.)
294 sleep(Rendezvous *r, int (*condition)(void*), void *arg)
298 s = inhibit(); /* interrupts */
302 * if condition happened, never mind
304 if((*condition)(arg)){
306 allow(); /* interrupts */
311 * now we are committed to
312 * change state and call scheduler
315 error("double sleep %d %d", r->p->pid, thisp->pid);
316 thisp->state = Wakeme;
319 allow(s); /* interrupts */
320 sched(); /* relinquish CPU */
328 wakeup(Rendezvous *r)
333 s = inhibit(); /* interrupts; return old state */
338 if(p->state != Wakeme)
339 panic("wakeup: not Wakeme");
350 both begin by disabling interrupts
351 and then locking the rendezvous structure.
354 may be called in an interrupt routine, the lock must be set only
355 with interrupts disabled on the current processor,
356 so that if the interrupt comes during
358 it will occur only on a different processor;
359 if it occurred on the processor executing
364 At the end of each routine, the lock is released and processor priority
365 returned to its previous value.
367 needs to inhibit interrupts in case
368 it is being called by a process;
369 this is a no-op if called by an interrupt.)
372 checks to see if the condition has become true, and returns if so.
373 Otherwise the process posts its name in the rendezvous structure where
375 may find it, marks its state as waiting to be awakened
376 (this is for error checking only) and goes to sleep by calling
378 The manipulation of the rendezvous structure is all done under the lock,
381 only examines it under lock, so atomicity and mutual exclusion
385 has a simpler job. When it is called, the condition has implicitly become true,
386 so it locks the rendezvous, sees if a process is waiting, and readies it to run.
390 The synchronisation technique used here
391 is similar to known methods, even as far back as Saltzer's thesis
393 The code looks trivially correct in retrospect: all access to data structures is done
394 under lock, and there is no place that things may get out of order.
395 Nonetheless, it took us several iterations to arrive at the above
396 implementation, because the things that
398 go wrong are often hard to see. We had four earlier implementations
399 that were examined at great length and only found faulty when a new,
400 different style of device or activity was added to the system.
403 Here, for example, is an incorrect implementation of wakeup,
404 closely related to one of our versions.
407 wakeup(Rendezvous *r)
417 if(p->state != Wakeme)
418 panic("wakeup: not Wakeme");
426 The mistake is that the reading of
428 may occur just as the other process calls
430 so when the interrupt examines the structure it sees no one to wake up,
431 and the sleeping process misses its wakeup.
432 We wrote the code this way because we reasoned that the fetch
436 was inherently atomic and need not be interlocked.
437 The bug was found by examination when a new, very fast device
438 was added to the system and sleeps and interrupts were closely overlapped.
439 However, it was in the system for a couple of months without causing an error.
441 How many errors lurk in our supposedly correct implementation above?
442 We would like a way to guarantee correctness; formal proofs are beyond
443 our abilities when the subtleties of interrupts and multiprocessors are
445 With that in mind, the first three authors approached the last to see
446 if his automated tool for checking protocols
449 used to verify our new
454 The code was translated into the language for that system
455 (with, unfortunately, no way of proving that the translation is itself correct)
456 and validated by exhaustive simulation.
458 The validator found a bug.
459 Under our assumption that there is only one interrupt, the bug cannot
460 occur, but in the more general case of multiple interrupts synchronising
461 through the same condition function and rendezvous,
462 the process and interrupt can enter a peculiar state.
463 A process may return from
465 with the condition function false
466 if there is a delay between
467 the condition coming true and
470 with the delay occurring
471 just as the receiving process calls
473 The condition is now true, so that process returns immediately,
474 does whatever is appropriate, and then (say) decides to call
476 again. This time the condition is false, so it goes to sleep.
477 The wakeup process then finds a sleeping process,
478 and wakes it up, but the condition is now false.
480 There is an easy (and verified) solution: at the end of
485 if the condition is false, execute
487 again. This re-execution cannot repeat; the second synchronisation is guaranteed
488 to function under the external conditions we are supposing.
490 Even though the original code is completely
491 protected by interlocks and had been examined carefully by all of us
492 and believed correct, it still had problems.
493 It seems to us that some exhaustive automated analysis is
494 required of multiprocessor algorithms to guarantee their safety.
495 Our experience has confirmed that it is almost impossible to
496 guarantee by inspection or simple testing the correctness
497 of a multiprocessor algorithm. Testing can demonstrate the presence
498 of bugs but not their absence
501 We close by claiming that the code above with
502 the suggested modification passes all tests we have for correctness
503 under the assumptions used in the validation.
504 We would not, however, go so far as to claim that it is universally correct.
508 [Bac86] Maurice J. Bach,
509 .I "The Design of the UNIX Operating System,
514 [Dij72] Edsger W. Dijkstra,
515 ``The Humble Programmer \- 1972 Turing Award Lecture'',
520 [Hol91] Gerard J. Holzmann,
521 .I "Design and Validation of Computer Protocols,
531 ``Plan 9 from Bell Labs'',
532 .I "Proceedings of the Summer 1990 UKUUG Conference,
537 [Sal66] Jerome H. Saltzer,
538 .I "Traffic Control in a Multiplexed Computer System