When
using an intelligent controller with on-board cache, it is important to
run benchmark tests with file sizes which exceed the cache capacity, to
ensure that the system performs well when a data stream with a random size
relationship to the cache is accessed. In the course of developing
EDIT2, we have established a suite of tests using Diskspeed, which indicate
a transfer rate profile over a wide range of file sizes -
-
4, 64, 100, 128, 256, 512,
1000, and 10,000 MB files sizes
We have run suite of tests every
time we experiment with any of the following variables (using the Dell
PERC2/DC or PERC3/DC RAID controllers) -
-
RAID format cluster size (512KB,
1024KB, 2048KB options tested)
-
RAID controller transfer block
size (32KB, 64KB, 128KB options tested)
-
RAID controller cache size (64MB
and 128MB options tested)
-
RAID controller read mode (Read
ahead, Adaptive, no read-ahead)
-
RAID controller I/O mode (cached
/ direct)
-
RAID controller cache write
mode (write back / write thru)
-
RAID level (RAID 0 and RAID
5 tested)
This is a complicated series
of tests. A single suite takes approximately 20 minutes to complete, and
the above 17 variables result in a total of 289 permutations, which would
take approximately 100 hours to complete in a single session.
To simplify this process,
we ran a series of tests using top, middle and bottom files sizes to eliminate
certain variables, arriving at the following constants which gave best
results throughout the entire file size range -
-
RAID format cluster size - 1024KB
-
RAID controller transfer block
size - 64KB
-
RAID controller cache size -
128MB
-
RAID controller read mode -
Adaptive
-
RAID controller I/O mode - Cached
I/O
-
RAID controller write mode -
Write
back
-
RAID level - RAID 5
The above list has become our
standard configuration settings, and we now perform the 4, 64, 100, 128,
256, 512, 1000, and 10,000 MB files size test suite as a benchmark for
our NLE RAIDs.
For
a complete readout of Diskspeed benchmark data on our systems, taken on
Dell Precision 620 and 530 workstations with both PERC2/DC and PERC3/DC
RAID controllers - click here
There has been much discussion
in our industry regarding the feasibility of implementing different RAID
levels for video applications. Many prominent NLE manufacturers maintain
that anything other than RAID0 (simple striping of a set of disks together)
will result in unacceptable data transfer rates. At the time we established
our digital suite, we were editing a four part documentary series with
over 200 reels of raw material - the logging process alone had taken 3
months, therefore we were interested in achieving the highest level of
data security possible. We opted for RAID5, using a set of 8 72GB drives,
in which the parity data is spread across all drives - any single drive
can fail and the remaining 7 drives can reconstruct the missing data on
a new 8th drive.
We had heard that RAID3 gave
better performance than RAID5, though were unable to test this as Dell
RAID products do not support RAID3. This is because in RAID 3, parity data
is stored on a dedicated drive, resulting in a higher mechanical loading
of the parity drive, and thus a higher proportion of unrecoverable failures
on the parity drives. We had also tried RAID0, but having lost all data
once due to drive failure, we were reluctant to follow that route again.
Using a PERC2/DC RAID controller,
we tested both RAID0 and RAID5, and given an adequate RAID controller cache
(128MB) found no perceptible difference in performance between the two
RAID levels. After this, all our work has been undertaken on RAID5.
The first Dell workstation
(Precision 620) was equipped with a PERC2/DC RAID controller, which offers
80Mbit transfer rate. The Power vault drives are U160 drives. It is important
to understand, that under a write operation, each drive is handling one
eighth of the write, so the controller card can keep feeding data to the
drives as the next drive to write will usually be ready as soon as the
previous drive begins to write. The reverse is true on the read operation,
as all eight drives are sharing the same controller and cache path, so
it is on read operations that the controller's headroom could constitute
a bottleneck.
The PERC2 uses a 32-bit PCI
slot. When migrating to the Precision 530 platform, we were interested
in freeing up the interrupt channels which are associated with the 32-bit
slots, to eliminate conflicts with our NLE card. Like the Precision 620,
the 530 has two 64-bit PCI slots, but unlike the 620, the 530 64-bit slots
so not share IRQ channels with the 32-bit slots. We therefore decided to
migrate to the PERC3/DC RAID controller, which besides offering 64-bit
PCI compliance also offers a U160 transfer rate - an important feature
given that, at the same time, we were expanding the storage system to two
500GB power vaults - one on each RAID controller channel. Increased controller
bandwidth, whilst not necessarily giving better editing performance, would
speed up transfers of media data between the two RAID arrays.
(Dell's
support website contains much data on the PERC2 and PERC3 2-channel
controllers, these devices are manufactured for Dell by American
Megatrends (AMI) under which name they are sold as the Elite 1600 and
the Enterprise 1500 RAID controllers.) There is also a four channel controller
- PERC2/QC - but this is not yet recommended for such applications.
Both the PERC2/DC and the
PERC3/DC can be supplied with on-board battery back-up for the cache RAM
- this is important because a power failure during a write operation could
leave data stranded in RAM - the OS would presume that it had been successfully
delivered and stored on the RAID - whilst in fact it would be lost under
the power outage, and the fault may not show itself until much later. (Note
- Dell recommends disconnecting the battery back-up before removing or
inserting RAM onto the RAID controller, to avoid damaging the RAM).
Diskspeed measures sustained
data transfer rate for a given file size. It is important to understand
that audio files and video files present the NLE with different demands.
Using the silver. system as a benchmark, audio files would require a maximum
of 0.09MB/sec for a maximum of 8 real-time audio tracks (tracks in excess
of this would be rendered for playback) this gives an overhead of 0.56MB/sec.
Similarly, playback of 2
real-time video tracks at the highest MPEG2 compression ratio (NDQ50) would
require 2 x 6.25MB/sec = 12.50MB/sec. Added to this is an estimate overhead
ofr a single graphics track of 2MB/sec.
This gives a maximum transfer
rate demand as follows -
-
2 video tracks @ 6.25MB/sec
= 12.50MB/sec
-
8 audio tracks @ 0.09MB/sec
= 0.56MB/sec
-
plus graphics layer – max.
2MB/sec
-
Total required sustained
transfer rate = 15MB/sec.
These are the figures for silver.mpg.
other ///FAST systems, as well as systems from other manufacturers, would
present different requirements.
It is therefore important
to understand, that any 16KB (audio) block size transfer over 0.56MB/sec
(sustained) and any 512KB (video) block size transfer over 12.5MB/sec is
acceptable - the system will never present a demand exceeding 15MB/sec,
and for editing purposes, the system cannot use a higher transfer rate.
(Though other data maintenance applications may well benefit from higher
rates).
The major difference between
Diskspeed's data transfer test and the real transfer rate demanded by a
fully loaded project timeline under editing, is that whilst they both present
the system with the same headroom demands, editing will often demand transfers
of audio and video in block-size combinations which may not equate with
the benchmark testing of the two block sizes separately. It is therefore
vital to have enough overhead to take account of this variable.
As we have learnt to understand
the transfer rate demands of silver. we have also become more confident
at interpreting the results of Diskspeed. Whilst Diskspeed may sometimes
indicate transient hangs which are not a problem, we can conclude that
any RAID array which passes the Diskspeed test, will usually perform well
during editing and playback - and that if problems which seem to indicate
disk access inadequacy should show up, it is usually related to other factors
such as shared IRQ's or software bugs.
We have used Diskspeed on
three different workstation platforms, using both Windows NT4.0 SP5 and
Windows 2000 SP2 (on FAST studio XL 2.5 and later) and have not seen any
dramatic difference in transfer rates which was not attributable to hardware
factors such as the RAID controller. (Our first "clone system used a non
intelligent Adaptec controller, rather than the PERC2 and PERC3 which we
have since used on Dell platforms with great success.
In the course of this project
we have looked for other benchmark tools, but have found none better than
///FAST Diskspeed. No benchmark test is perfect, so it is important to
interpret the results - run tests on only one modified variable at a time,
and remember to test over a wide range of file sizes. Do not automatically
assume, that a configuration which, for a given files size, gives the best
transfer rate, is necessarily the right configuration. Pursue instead a
configuration which gives similar results across the entire file size and
linear/random read spectrum, and pursue a headroom which gives a sustained
transfer rate at least a quarter of the specified burst transfer rate of
the slowest component in the chain. Finally, do not assume that all NLE
performance problems are related to transfer rate issues, they may be caused
by other system configuration parameters.
Warning:
If you are contemplating migration from PERC2/DC to PERC3/DC, do not attempt
this migration using a RAID with "hot" unbacked-up data. Although the PERC3/DC
can reconfigure a RAID which was previously formatted on a PERC2/DC, we
have experienced that the data on such a RAID may be unreadable in some
applications, including the NLE system - either migrate when the drives
may be erased without cost, or arrange adequate spare RAID capacity to
make a copy of each RAID's data, so that the master RAID may be reformatted
properly and data restored afterwards. This "experiment" cost us 5 weeks
re-digitisation of video/audio data! |