>
|
ORtera's purpose is to make your storage planning and configuration process
more effective, faster, cheaper, and fun for you.
You would like to lower the response time on your system, and
know how much headroom you have to avoid the possibility of
meltdown, way in advance. ORtera Heuristics will guide you with
specific metrics, analytics and recommendations, in plain
English. Here is how ORtera will help you:
First, install and start ORtera
Upon installation and first starting ORtera, it will perform an
initial discovery and 3-minute monitoring session on the 10
busiest application processes and the filesystems they use.
Click to enlarge
You may schedule monitoring sessions of greater length on
specific processes and filesystems of your choosing.
When ORtera has completed monitoring, the tree nodes for the
processes and filesystems that interest you will indicate by
color whether ORtera has detected suboptimal conditions. Red
means severe performance-degrading conditions have been detected,
yellow means minor conditions, green means no such conditions
detected, and gray means no I/O detected during the monitoring
period. You should open red and yellow nodes to view the
Heuristics tab for each.
In this case, Heuristics indicates that I/O fragmentation at the
filesystem level is degrading performance. The heuristic
suggests that we configure the filesystem for a 1 MB I/O size:
ORtera also reveals fragmentation at the RAID level.
RAID fragmentation is a function of the stripe unit size, and
Heuristics suggests an increase in stripe unit.
Sometimes you divide the I/O
size on purpose, to spread it over multiple LUNS and channels;
however, this is a technique used mostly for large sequential
I/O. For random I/O, it is generally better not to divide it.
When dividing the I/O is not intended, and is causing inefficient
use of the lower levels, ORtera identifies it as fragmentation.
ORtera reveals that less than 40% headroom exists before
saturation of the physical layer. The heuristics suggests that
some of the workload be diverted.
ORtera reveals a 16-to-1 ratio of I/O size between the filesystem
and logical device, and 2-to-1 ratio between the logical and
physical devices. The I/O size chart verifies the fragmentation
condition. You observe it by Ctrl-selecting each layer to view
the transformation between layers
Random reads and writes are being transformed into sequential
reads and writes, while 80% of the application I/O latency is
spent on random reads. This I/O-Type chart is another unique
presentation of ORtera, not easily obtained by any other means.
This chart may appear difficult to interpret at first sight, but
the more times you use it, the more illumination it provides you
about the workload. The breakout of Ops, Data, and Time by
access type gives you insight into the balance of the workload
and how homogeneous it is.
In this example, you see a large amount of random write (red
bars) at the filesystem layer (the top set of bars) converted
to sequential write (yellow bars) at the logical device layer
(the middle set of bars). There is also random read (blue bars)
being converted to sequential read (green bars). Some of the
activity at the logical device layer is not application payload,
but filesystem metadata, such as access time and file-size
updates.
The type chart allows you to make observations regarding
relative I/O size by access type balance. For example, 64% of
the operations account for 64% of the data (random read at the
filesystem level, in this example), or 25% of the operations
account for 12% of the data (random read at the logical device
level, in this example). Therefore, at the filesystem level,
64% of the data is read, while at the logical layer only 34% is
read. The difference is the amount of page-cache hits on read.
As you can tell, this chart is information rich, and worth
contemplation time.
As Amdahl's Law teaches, where the most time is spent the most
potential for performance improvement exists. Here the big
time-consumer is the sequential write at the logical layer
that has resulted from the I/O fragmentation you observe in
this example.
ORtera reveals a 14-to-1 ratio of load level between the
filesystem and physical layers. These are discrete load level
measurements, not time averages. Load level, as reported by
ORtera, is a true measure of the actual number of concurrent I/O
threads, and the portion of time spent, void of idle time. The
measure of system load level, reported by most other tools, is
averaged over time and does not reflect the true demand on the
storage system.
The physical layer is operating at 42% of capability. Another
important perspective, not generally available without ORtera,
concerns the instantaneous arrival and completion rates of the
workload. Like the ORtera load-level metric, these are void of
idle time. The relationship between them offers insight into
the saturation point of the storage configuration. Clearly, an
arrival rate that exceeds the completion rate cannot be
sustained. ORtera uses capability expectation to calculate
Capability Busy, which is based on the observed performance of
the current configuration for the current workload composition
and average load level. This estimates the sustainable rate of
the current configuration at 100% capability for the current
workload composition and load level distribution.
There is a large variation in response time. A critical metric
for quality of service is not just response time, but the
standard deviation of response time. When response time is less
variable, performance is more consistent. The ORtera metric,
instantaneous bandwidth, is also void of idle time. It shows
the true demonstrated bandwidth of the configuration.
The load distribution is to a single file at the filesystem
layer, and four LUNS at the physical layer. A very powerful
feature of ORtera, not readily available by other means, is a
view of the resources used at each level sorted by Ops, Data
and Time. In particular, it shows individual files in the
filesystem. There are several heuristics dealing with
homogeneity and balance. When these conditions are detected,
they can be viewed here by selecting the subject resources.
You follow the Heuristics guidance and reconfigure the
filesystem for 1-MB I/O. Given the large increase in I/O size
at the filesystem layer, you raise the original heuristic
guidance for a 128-KB stripe unit to 512 KB. In this example,
the filesystem, kernel and logical volume manager were
reconfigured like this:
-
UFS
- maxcontig (newfs -C) 128 (1 MB)
- cgsize (newfs -c) 240 (cylinders)
- maxbpg (tunefs -e) 6400
-
Kernel
- maxphys 1048576 (1 MB)
- md_maxphys 1048576 (1 MB)
-
SVM
Having made the changes, you repeat the monitoring session to
see the results. Your configuration change has resulted in
greatly improved I/O-size transformations. You have achieved
the large I/O desired.
Result: Random writes are no longer converted to
sequential writes. There is less sequential read as well. The
sequential write component was a result of the fragmentation,
and was causing inordinate load levels at the lower levels.
The reconfiguration has eliminated this artifact.
Result: Load level ratios are now 4-to-1, all the way
down to the physical devices. Before the change there was a
14-to-1 ratio of load level. The physical devices were running
near the edge of their performance capability, now they are in a
comfortable range. Note the third row, Avg. load level:
Result: Capability utilization was reduced from 27% to
20% at the filesystem level, 47% to 10% at the logical device
level, and 42% to 15% at the physical device level.
The impact on headroom is a tremendous improvement, and can add
months, if not years, to the longevity of the configuration, and
prevent a sudden meltdown. Operating the physical resource in a
comfortable range also reduces variability in performance,
improving quality of service overall.
Result: You have greatly improved response time, and
reduced response time variation, and you have greatly increased
delivered bandwidth at all layers. The 4th row from the top of
this table shows that the raw power of the configuration to
deliver I/O has been improved almost threefold. The last two
rows from the bottom show that you have dramatically improved
response time and variability of response time at all levels of
the configuration.
ORtera Summary Report delivers printer-ready documentation for
the system run-book or other logs. It provides an audit trail
of baseline and changes in performance for external documentation
and sharing with supervisors or customers.
In this case, only three performance-degrading conditions were
present before diagnosis by ORtera and reconfiguration. ORtera
Heuristics diagnoses and guides you in correction of 24
performance-degrading conditions:
With ORtera, in minutes you have diagnosed your storage
bottlenecks and the causes of low resource headroom on your
system, making the system responsive for users, and preventing
a costly sudden meltdown. In addition, you now have a full
understanding of the capabilities and constraints of your
storage system, and you know it is performing at its best.
ORtera makes your storage configuration process more effective,
faster, cheaper, and fun:
|
 |
|
What Users Say
|
|
"The visual DTrace for storage"
"Cracks the storage stack"
"Fun"
"Impressive"
"Easy and intuitive"
"Well thought out"
"By far the best I've seen"
"Incorporating a rigorous model and detailed heuristics"
"I would really recommend it"
|
|