3 venti \- archival storage server
45 is a SHA1-addressed archival storage server.
48 for a full introduction to the system.
49 This page documents the structure and operation of the server.
51 A venti server requires multiple disks or disk partitions,
52 each of which must be properly formatted before the server
55 The venti server maintains three disk structures, typically
56 stored on raw disk partitions:
59 which holds, in sequential order,
60 the contents of every block written to the server;
63 which helps locate a block in the data log given its score;
66 a concise summary of which scores are present in the index.
67 The data log is the primary storage.
68 To improve the robustness, it should be stored on
69 a device that provides RAID functionality.
70 The index and the bloom filter are optimizations
71 employed to access the data log efficiently and can be rebuilt
74 The data log is logically split into sections called
76 typically sized for easy offline backup
78 A data log may comprise many disks, each storing
81 .IR "arena partitions" .
82 Arena partitions are filled in the order given in the configuration.
84 The index is logically split into block-sized pieces called
86 each of which is responsible for a particular range of scores.
87 An index may be split across many disks, each storing many buckets.
89 .IR "index sections" .
91 The index must be sized so that no bucket is full.
92 When a bucket fills, the server must be shut down and
93 the index made larger.
94 Since scores appear random, each bucket will contain
95 approximately the same number of entries.
96 Index entries are 40 bytes long. Assuming that a typical block
97 being written to the server is 8192 bytes and compresses to 4096
98 bytes, the active index is expected to be about 1% of
100 Storing smaller blocks increases the relative index footprint;
101 storing larger blocks decreases it.
102 To allow variation in both block size and the random distribution
103 of scores to buckets, the suggested index size is 5% of
106 The (optional) bloom filter is a large bitmap that is stored on disk but
107 also kept completely in memory while the venti server runs.
108 It helps the venti server efficiently detect scores that are
110 already stored in the index.
111 The bloom filter starts out zeroed.
112 Each score recorded in the bloom filter is hashed to choose
114 bits to set in the bloom filter.
115 A score is definitely not stored in the index of any of its
118 The bloom filter thus has two parameters:
121 and the total bitmap size
122 (maximum 512MB, 2\s-2\u32\d\s+2 bits).
124 The bloom filter should be sized so that
133 is the expected number of blocks stored on the server
136 is the bitmap size in bits.
137 The false positive rate of the bloom filter when sized
138 this way is approximately 2\s-2\u\-\fInblock\fR\d\s+2.
140 less than 10 are not very useful;
142 greater than 24 are probably a waste of memory.
152 it will derive an appropriate
155 Venti can make effective use of large amounts of memory
160 holds recently-accessed venti data blocks, which the server refers to as
162 The lump cache should be at least 1MB but can profitably be much larger.
163 The lump cache can be thought of as the level-1 cache:
164 read requests handled by the lump cache can
169 holds recently-accessed
171 blocks from the arena partitions.
172 The block cache needs to be able to simultaneously hold two blocks
173 from each arena plus four blocks for the currently-filling arena.
174 The block cache can be thought of as the level-2 cache:
175 read requests handled by the block cache are slower than those
176 handled by the lump cache, since the lump data must be extracted
177 from the raw disk blocks and possibly decompressed, but no
178 disk accesses are necessary.
182 holds recently-accessed or prefetched
184 The index cache needs to be able to hold index entries
185 for three or four arenas, at least, in order for prefetching
186 to work properly. Each index entry is 50 bytes.
187 Assuming 500MB arenas of
188 128,000 blocks that are 4096 bytes each after compression,
189 the minimum index cache size is about 6MB.
190 The index cache can be thought of as the level-3 cache:
191 read requests handled by the index cache must still go
192 to disk to fetch the arena blocks, but the costly random
193 access to the index is avoided.
195 The size of the index cache determines how long venti
196 can sustain its `burst' write throughput, during which time
197 the only disk accesses on the critical path
198 are sequential writes to the arena partitions.
199 For example, if you want to be able to sustain 10MB/s
200 for an hour, you need enough index cache to hold entries
201 for 36GB of blocks. Assuming 8192-byte blocks,
202 you need room for almost five million index entries.
203 Since index entries are 50 bytes each, you need 250MB
205 If the background index update process can make a single
206 pass through the index in an hour, which is possible,
207 then you can sustain the 10MB/s indefinitely (at least until
208 the arenas are all filled).
212 requires memory equal to its size on disk,
215 A reasonable starting allocation is to
216 divide memory equally (in thirds) between
217 the bloom filter, the index cache, and the lump and block caches;
218 the third of memory allocated to the lump and block caches
219 should be split unevenly, with more (say, two thirds)
220 going to the block cache.
222 The venti server announces two network services, one
223 (conventionally TCP port
226 the venti protocol as described in
229 (conventionally TCP port
233 The venti web server provides the following
234 URLs for accessing status information:
239 A summary of the usage of the arenas and index sections.
246 Brief storage totals.
249 The current integer value of
253 whether or not to compress blocks
256 whether to write entries to the debugging logs;
258 whether to collect run-time statistics;
259 .BR icachesleeptime ,
260 the time in milliseconds between successive updates
261 of megabytes of the index cache;
262 .BR arenasumsleeptime ,
263 the time in milliseconds between reads while
264 checksumming an arena in the background.
265 The two sleep times should be (but are not) managed by venti;
266 they exist to provide more experience with their effects.
267 The other variables exist only for debugging and
268 performance measurement.
270 .BI /set/ variable / value
276 .BI /graph/ name / param / param / \fR...
277 A PNG image graphing the named run-time statistic over time.
278 The details of names and parameters are undocumented;
281 in the venti sources.
284 A list of all debugging logs present in the server's memory.
287 The contents of the debugging log with the given
291 Force venti to begin flushing the index cache to disk.
292 The request response will not be sent until the flush
296 Force venti to begin flushing the arena block cache to disk.
297 The request response will not be sent until the flush
301 Requests for other files are served by consulting a
302 directory named in the configuration file
306 .SS Configuration File
307 A venti configuration file
308 enumerates the various index sections and
309 arenas that constitute a venti system.
310 The components are indicated by the name of the file, typically
311 a disk partition, in which they reside. The configuration
312 file is the only location that file names are used. Internally,
313 venti uses the names assigned when the components were formatted
320 In particular, only the configuration needs to be
321 changed if a component is moved to a different file.
323 The configuration file consists of lines in the form described below.
327 .TF "\fLindex\fI name "
331 Names the index for the system.
335 is an arena partition, formatted using
340 is an index section, formatted using
345 is a bloom filter, formatted using
349 After formatting a venti system using
351 the order of arenas and index sections should not be changed.
352 Additional arenas can be appended to the configuration;
357 flag to update the index.
359 The configuration file also holds configuration parameters
360 for the venti server itself.
362 .TF "\fLhttpaddr\fI netaddr "
374 network address to announce venti service
378 .BI httpaddr " netaddr
379 network address to announce HTTP service
384 queue writes in memory
385 (default is not to queue)
388 directory tree containing files for
390 internal HTTP server to consult for unrecognized URLs
393 The units for the various cache sizes above can be specified by appending a
399 to indicate kilobytes, megabytes, or gigabytes respectively.
403 name in the configuration lines above can be of the form
405 to specify a range of the file.
409 are specified in bytes but can have the usual
420 This notation eliminates the need to
421 partition raw disks on non-Plan 9 systems.
423 Many of the options to Venti duplicate parameters that
424 can be specified in the configuration file.
425 The command line options override those found in a
427 Additional options are:
428 .TF "\fL-c\fI config"
432 The server configuration file
437 Produce various debugging information on standard error.
442 Enable logging. By default all logging is disabled.
443 Logging slows server operation considerably.
448 percent of the available free RAM, and partition it
449 per the guidelines in the
452 This percentage should be large enough to include the entire bloom filter.
453 This overrides all other memory sizing parameters,
454 including those on the command line and in the configuration file.
457 Allow only read access to the venti data.
460 Do not run in the background.
462 the foreground process will exit once the Venti server
463 is initialized and ready for connections.
466 A simple configuration:
471 isect /tmp/disks/isect0
472 isect /tmp/disks/isect1
473 arenas /tmp/disks/arenas
474 bloom /tmp/disks/bloom
478 Format the index sections, the arena partition,
479 the bloom filter, and
480 finally the main index:
483 % venti/fmtisect isect0. /tmp/disks/isect0
484 % venti/fmtisect isect1. /tmp/disks/isect1
485 % venti/fmtarenas arenas0. /tmp/disks/arenas &
486 % venti/fmtbloom /tmp/disks/bloom &
488 % venti/fmtindex venti.conf
492 Start the server and check the storage statistics:
496 % hget http://$sysname/storage
499 .B /sys/src/cmd/venti/srv
504 .IR venti-backup (8),
507 Sean Quinlan and Sean Dorward,
508 ``Venti: a new approach to archival storage'',
509 .I "Usenix Conference on File and Storage Technologies" ,
512 Setting up a venti server is too complicated.