10 interface allows raw commands to be sent. Traditionally,
11 only SCSI CDBs could be sent in this manner. For devices
12 that respond to ATA/ATAPI commands, a small set of SCSI CDBs
13 have been translated into an ATA equivalent. This approach
14 works very well. However, there are ATA commands such as
15 SMART which do not have direct translations. I describe how
16 ATA/ATAPI commands were supported without disturbing
17 existing functionality.
24 drivers for plan 9, it has been necessary to copy laundry
25 list of special commands that were needed with previous
26 drivers. The set of commands supported by each device
27 driver varies, and they are typically executed by writing a
28 magic string into the driver's
30 file. This requires code duplicated for each driver, and
31 covers few commands. Coverage depends on the driver. It is
32 not possible for the control interface to return output,
33 making some commands impossible to implement. While a
34 work around has been to change the contents of the control
35 file, this solution is extremely unwieldy even for simple
37 .CW "IDENTIFY DEVICE" .
43 devices respond to a small subset of SCSI commands
44 through the raw interface and the normal read/write interface uses
45 SCSI command blocks. SCSI devices, of course, respond natively
46 while ATA devices emulate these commands with the help
50 can get surprisingly far with ATA devices, and ATAPI
53 work quite well. Although a new implementation might not
54 use this approach, replacing the interface did not appear
55 cost effective and would lead to maximum incompatibilities,
56 while this interface is experimental. This means that the raw interface will need
57 a method of signaling an ATA command rather than a SCSI CDB.
59 An unattractive wart of the ATA command set is there are seven
60 protocols and two command sizes. While each command has a
61 specific size (either 28-bit LBA or 48-bit LLBA) and is
62 associated with a particular protocol (PIO, DMA, PACKET,
63 etc.), this information is available only by table lookup.
64 While this information may not always be necessary for simple
65 SATA-based controllers, for the IDE controllers, it is required.
66 PIO commands are required and use a different set of registers
67 than DMA commands. Queued DMA commands and ATAPI
68 commands are submitted differently still. Finally,
69 the data direction is implied by the command. Having these
70 three extra pieces of information in addition to the command
73 A final bit of extra-command information that may be useful
76 timeouts work with many drivers, it would be an added
77 convenience to be able to specify a timeout along with the
78 command. This seems a good idea in principle, since some
79 ATA commands should return within milli- or microseconds,
80 others may take hours to complete. On the other hand, the
81 existing SCSI interface does not support it and changing its
82 kernel-to-user space format would be quite invasive. Timeouts
83 were left for a later date.
85 Protocol and Data Format
87 The existing protocol for SCSI commands suits ATA as well.
88 We simply write the command block to the raw device. Then
89 we either write or read the data. Finally the status block
90 is read. What remains is choosing a data format for ATA
93 The T10 Committee has defined a SCSI-to-ATA translation
94 scheme called SAT[4]. This provides a standard set of
95 translations between common SCSI commands and ATA commands.
96 It specifies the ATA protocol and some other sideband
97 information. It is particularly useful for common commands
101 .CW "READ CAPACITY\ (12)" .
102 Unfortunately, our purpose is to address the uncommon commands.
103 For those, special commands
104 .CW "ATA PASSTHROUGH\ (12)"
107 exist. Unfortunately several commands we are interested in,
108 such as those that set transfer modes are not allowed by the
109 standard. This is not a major obstacle. We could simply
110 ignore the standard. But this goes against the general
111 reasons for using an established standard: interoperability.
112 Finally, it should be mentioned that SAT format adds yet
113 another intermediate format of variable size which would
114 require translation to a usable format for all the existing
115 Plan 9 drivers. If we're not hewing to a standard, we should
116 build or choose for convenience.
118 ATA-8 and ACS-2 also specify an abstract register layout.
119 The size of the command block varies based on the “size”
120 (either 28- or 48-bits) of the command and only context
121 differentiates a command from a response. The SATA
122 specification defines host-to-drive communications. The
123 formats of transactions are called Frame Information
124 Structures (FISes). Typically drivers fill out the command
125 FISes directly and have direct access to the Device-to-Host
126 Register (D2H) FISes that return the resulting ATA register
127 settings. The command FISes are also called Host-to-Device
128 (H2D) Register FISes. Using this structure has several advantages. It
129 is directly usable by many of the existing SATA drivers.
130 All SATA commands are the same size and are tagged as
131 commands. Normal responses are also all of the same size
132 and are tagged as responses. Unfortunately, the ATA
133 protocol is not specified. Nevertheless, SATA FISes seem to
134 handle most of our needs and are quite convenient; they can
135 be used directly by two of the three current SATA drivers.
139 Raw ATA commands are formatted as a ATA escape byte, an
142 and the FIS. Typically this would be a
143 H2D FIS, but this is not a requirement. The escape byte
145 which is not and, according to the current specification,
146 will never be a valid SCSI command, was chosen. The
149 and other FIS construction details are specified in
150 .CW "/sys/include/fis.h" .
153 encodes the ATA protocol, the command “size” and data
154 direction. The “atazz” command format is pictured in \*(Fn.
175 B: hdr("cmd") at A+(0, -h)
180 C: hdr("lba16") at B+(0, -h)
185 D: hdr("lba40") at C+(0, -h)
190 E: hdr("rsvd") at D+(0, -h)
196 arrow from F.e to G.w
199 ] at G.sw +(-w*2, -h)
204 arrow from H.e to HH.w
216 L: hdr("stat") at K+(0, -h)
221 M: hdr("lba16") at L+(0, -h)
226 O: hdr("lba40") at M+(0, -h)
231 P: hdr("rsvd") at O+(0, -h)
233 ] at Q.sw +(-w*3.5, -3*h)
234 arrow from Q.w to I.e
240 Raw ATA replies are formatted as a one-byte
242 status code followed by the reply FIS.
243 The usual read/write register substitutions are
244 applied; ioport replaces flags, status replaces cmd, error
247 Important commands such as
248 .CW "SMART RETURN STATUS"
249 return no data. In this case, the protocol is run as usual.
250 The client performs a 0-byte read to fulfill data transfer
251 step. The status is in the D2H FIS returned as the status.
252 The vendor ATA command
254 is used to return the device signature FIS as there is no
255 universal in-band way to do this without side effects.
256 When talking only to ATA drives, it is possible to first
258 .CW "IDENTIFY PACKET DEVICE"
260 .CW "IDENTIFY DEVICE"
261 command, inferring the device type from the successful
262 command. However, it would not be possible to enumerate the
263 devices behind a port multiplier using this technique.
265 Kernel changes and Libfis
267 Very few changes were made to devsd to accommodate ATA
274 fields. To avoid disturbing existing SCSI functionality and
275 to allow drivers which support SCSI and ATA commands in
276 parallel, an additional
278 callback was added to
280 with the same signature as
283 callback. About twenty lines of code were
286 to recognize raw ATA commands and call the
291 To assist in generating the FISes to communicate with devices,
293 was written. It contains functions to identify and
294 enumerate the important features of a drive, to format
295 H2D FISes And finally, functions for
299 -devices to build D2H FISes to
300 capture the device signature.
302 All ATA device drivers for the 386 architecture have been
303 modified to accept raw ATA commands. Due to consolidation
304 of FIS handling, the AHCI driver lost
305 175 lines of code, additional non-atazz-related functionality
306 notwithstanding. The IDE driver remained exactly the same
307 size. Quite a bit more code could be removed if the driver
308 were reorganized. The mv50xx driver gained 153 lines of
309 code. Development versions of the Marvell Orion driver
310 lost over 500 lines while
312 is only about the same line count.
314 Since FIS formats were used to convey
315 commands from user space,
317 has been equally useful for user space applications. This is
320 interface can be thought of as an idealized HBA. Conversely,
321 the hardware driver does not need to know anything about
322 the command it is issuing beyond the ATA protocol.
326 As an example and debugging tool, the
332 they can be thought of as a driver for a virtual interface provided
335 combined with a disk console.
336 ATA commands are spelled out verbosely as in ACS-2. Arbitrary ATA
337 commands may be submitted, but the controller or driver may
338 not support all of them. Here is a sample transcript:
341 /dev/sda0 976773168; 512 50000f001b206489
343 /dev/sdD0 1023120; 512 0
344 /dev/sdE0 976773168; 512 50014ee2014f5b5a
345 /dev/sdF7 976773168; 512 5000cca214c3a6d3
347 az> smart enable operations
348 az> smart return status
352 34405000004fc2a00000000000000000
357 command is a special command that uses
359 to enumerate the controllers in the system.
360 For each controller, the
366 is issued. If this command is successful, the
367 number of sectors, sector size and WWN are gathered
370 device reports 0 sectors and 0 sector size because it is
371 a DVD-RW with no media. The
373 command is another special command that issues the
374 same commands a SATA driver would issue to gather
375 the information about the drive. The final two commands
377 and return the SMART status. The smart status is
378 returned in a D2H FIS. This result is parsed the result
379 is printed as either “normal,” or “threshold exceeded”
380 (the drive predicts imminent failure).
382 As a further real-world example, a drive from my file server
383 failed after a power outage. The simple diagnostic
384 .CW "SMART RETURN STATUS"
385 returned an uninformative “threshold exceeded.”
386 We can run some more in-depth tests. In this case we
387 will need to make up for the fact that
389 does not know every option to every command. We
394 az> smart lba0 1 execute off-line immediate # short data collection
396 col status: 00 never started
397 exe status: 89 failed: shipping damage, 90% left
404 Here we see that the drive claims that it was damaged in
405 shipping and the damage occurred in the first 10% of the
406 drive. Since we know the drive had been working before
407 the power outage, and the original symptom was excessive
408 UREs (Unrecoverable Read Errors) followed by write
409 failures, and finally a threshold exceeded condition, it is
410 reasonable to assume that the head may have crashed.
412 Stand Alone Applications
414 There are several obvious stand-alone applications for
415 this functionality: a drive firmware upgrade utility,
416 a drive scrubber that bypasses the drive cache and a
419 Since SCSI also supports a basic SMART-like
420 interface through the
421 .CW "SEND DIAGNOSTIC"
423 .CW "RECEIVE DIAGNOSTIC RESULTS"
426 gives a chance to test both raw ATA and SCSI
427 commands in the same application.
430 uses the usual techniques for gathering a list of
431 devices or uses the devices given. Then it issues a raw ATA request for
432 the device signature. If that fails, it is assumed
433 that the drive is SCSI, and a raw SCSI request is issued.
436 is able to reliably determine if SMART is supported
439 If successful, each device is probed every 5 minutes
440 and failures are logged. A one shot mode is also
443 chula# disk/smart -atv
447 sda3: threshold exceeded
456 and the remainder are ATA. Note that other drives
457 on the same controller are ATA.
460 was previously listed, we can check to see why no
461 results were reported by
464 chula# for(i in a3 C0)
465 echo identify device |
466 atazz /dev/sd$i >[2]/dev/null |
468 flags lba llba smart power nop sct
473 simply does not support the SMART feature set.
477 While the raw ATA interface has been used extensively
478 from user space and has allowed the removal of quirky
479 functionality, device setup has not yet been addressed.
480 For example, both the Orion and AHCI drivers have
481 an initialization routine similar to the following
485 setfissig(d, getsig(d));
489 if(settxmode(d, d->udma) != 0)
494 However in preparing this document, it was discovered
495 that one sets the power mode before setting the
496 transfer mode and the other does the opposite. It is
497 not clear that this particular difference is a problem,
498 but over time, such differences will be the source of bugs.
499 Neither the IDE nor the Marvell 50xx drivers sets the
500 power mode at all. Worse,
501 none is capable of properly addressing drives with
502 features such as PUIS (Power Up In Standby) enabled.
503 To addresses this problem all four of the ATA drivers would
506 Rather than maintaining a number of mutually out-of-date
507 drivers, it would be advantageous to build an ATA analog
510 using the raw ATA interface to submit ATA commands.
511 There are some difficulties that make such a change a bit
512 more than trivial. Since current model for hot-pluggable
513 devices is not compatible with the top-down
514 approach currently taken by
516 this would need to be addressed. It does not seem that
517 this would be difficult. Interface resets after failed commands
518 should also be addressed.
522 The current source including all the pc drivers and applications
533 .CW "quanstro/atazz" ,
536 .CW "quanstro/smart" .
538 The following manual pages are included:
550 Abbreviated References
556 .CW "http://plan9.bell-labs.com/magic/man2html/3/sd" .
561 .CW "http://plan9.bell-labs.com/magic/man2html/8/scuzz" .
564 .I "ATA/ATAPI Command Set\ \-\ 2" ,
565 revision 1, January 21, 2009,
566 formerly published online at
567 .CW "http://www.t13.org" .
570 .I "SCSI/ATA Translation\ \-\ 2 (SAT\-2)" ,
571 revision 7, February 18, 2007,
572 formerly published online at
573 .CW "http://www.t10.org" .