|
DAOS and Domino 8.5 — Exactly How Much Disk Space Can You Expect to Save?
Patrick Mancuso, senior software engineer, IBM
February, 2009
This free article is part of THE VIEW Free Trial for Lotus Professionals program. Subscribe here to receive many more articles like this one, plus new ones published every week at THE VIEW online knowledgebase.
Start quantifying the data storage and I/O savings you can achieve with the Domino Attachment and Object Service (DAOS) feature of Domino 8.5. Look at four deployments of DAOS, and gain insight into the factors that drive savings and the degree of savings that can be realized. Then, download a free tool from IBM that analyzes your data environment and estimates what you can expect to save by deploying DAOS.
Domino administrators and IT managers are understandably excited about the Domino Attachment and Object Service (DAOS) feature of Domino 8.5, which stores document attachments outside of Domino databases. DAOS promises to reduce storage costs by eliminating multiple copies of the same attachment in different documents or databases. But money is tight at most companies, and just because something is “new and cool” and promises to save money isn't a good enough reason for adopting new software and features. You need demonstrated results to justify implementing anything new.
Here is some real-world deployment data to give you an idea of what to expect with DAOS and help you decide if deploying DAOS is right for your organization. I present four actual production deployments of DAOS and the results yielded by each. My summary and analysis of the results illuminates the specific areas in which DAOS reduces your costs. I also discuss the types of environments that stand to gain the most from DAOS and point you to a free tool you can use to predict and quantify your own savings.
DAOS in a Nutshell
The primary function of DAOS is to extract attachment data from documents in a Domino database and store the data outside of the NSF file. When DAOS is enabled, it stores each attachment’s data in a Notes Large Object (NLO) file in the DAOS repository, and it stores a reference to the NLO file inside the NSF file. Many documents in many different NSF files can refer to the same NLO file, which eliminates duplicate storage of the data. Figure 1 shows how Domino stores the same attachment multiple times when DAOS is not enabled and the more efficient method it uses to store that same attachment when DAOS is enabled.
| Figure 1 |
Attachment storage with and without DAOS |
|
|
To keep track of the attachments it stores in its repository, DAOS does not use the attachment name and timestamp, which are subject to change if the user renames or resaves the file. Instead, for each document attachment, DAOS calculates a checksum for the attachment contents. However, it does not use a simple checksum, which is literally the result of adding all of the bytes in the attachment together. A simple checksum is easily fooled. For example the simple checksum "1,2,3,4,5" = 15 matches the checksum "5,3,1,2,4" = 15, but the content is actually different. Instead, DAOS uses the MD5 checksum algorithm, which does better than just adding up the bytes because it takes the order of the bytes into account as well. Even if a user renames and resaves an attachment, the attachment will have the same checksum results, so DAOS saves just one copy of the attachment in its repository. If the user modifies the attachment, thereby changing the checksum results, DAOS will store the attachment as a unique copy.
Since many documents can refer to the same NLO, DAOS maintains a reference count for each NLO. As users delete or change the documents, DAOS adjusts the reference count appropriately. When the reference count goes to zero, DAOS marks the NLO for deletion.
From the perspective of users and application developers, DAOS is invisible and works silently. The DAOS implementation is underneath the NSF layer and does not change the existing Domino API. You can enable DAOS for any Domino application; it is not limited to mail application use.
You can get a detailed look at how DAOS works in Erin Dame’s article “ Maximize the Green Benefits of Domino Attachment and Object Service (DAOS) for Notes and Domino 8.5” (THE VIEW, December 2008). (I’ve just given you the highlights here.)
The general process for deploying DAOS is to enable transaction logging, choose a location for the DAOS repository, and enable DAOS on the server. To enable DAOS on individual NSF files and extract the existing attachment data, use the command compact -c -daos on. For more DAOS deployment information and tips see the Domino DAOS wiki.
Before and After Snapshots of Four Systems
I’m going to show you some before and after results of deploying DAOS on four systems. In each case, all of the results that I present are due solely to the deployment of DAOS. Domino design and document compression was enabled before the DAOS deployment, as was the use of the LZ1 compression algorithm for attachments. Both features can be a source of significant disk space savings. The before and after snapshots demonstrate the additional savings realized by deploying DAOS.
All measurements were taken both before and after the initial conversion of existing mail data. Keep in mind that the savings do not stop with the reduction in the size of existing data. DAOS processes new attachments automatically as documents are added to the application.
Example 1: Mail Server Housing 112 Active User Files
The first DAOS deployment that we’ll look at is on a Domino mail server dedicated to the IBM Research division and operated by IBM Global Services. This machine services 112 active user mail files. It is part of a larger cluster of servers and provides an early deployment platform for Domino and other related tools.
Table 1 shows the amount of data in the data directory before and after enabling DAOS and the percentage reduction of that data. It also shows total data after enabling DAOS, including the NLO data in the DAOS data repository. In addition, this table provides the percentage reduction in average I/O volume, the change in average I/0 per second, and the percentage of the I/O busy rate accounted for by the DAOS directory.
| Table 1 |
Results for a mail server housing 112 active user files (AIX – 5.3.7.7 64-bit) |
| |
Before DAOS |
After DAOS |
Percentage Change |
| |
|
|
|
NSF data in data directory |
65.9 Gb |
23.9GB |
63% reduction |
NLO data in DAOS repository |
— |
25.3GB |
— |
Total data |
65.9GB |
49.2GB |
25% reduction |
Average I/O volume |
— |
— |
8% reduction |
Average I/0 per second |
— |
— |
1% increase |
Average CPU busy |
— |
— |
No change |
Relative I/O rates |
— |
The DAOS directory disk has 1.3% of the busy rate of the data directory |
— |
|
Example 2: Application Server
The second example comes from a Domino application server operated by the IBM Research division. Again, this server is part of a larger cluster of servers, and it is used for early deployment of Domino. The applications housed on this server have attachments, so DAOS provided savings in this environment as well. Table 2 shows the before and after picture for this server.
| Table 2 |
Results for an application server (Windows Server 2003 – 32-bit) |
| |
Before DAOS |
After DAOS |
Percentage Change |
| |
|
|
|
NSF data in data directory |
17.3GB |
9.8GB |
43% reduction |
NLO data in DAOS repository |
— |
5.3GB |
— |
Total data |
17.3GB |
15.1GB |
12% reduction |
Average I/O volume |
— |
— |
Not measured |
Average I/0 per second |
— |
— |
Not measured |
Average CPU busy |
— |
— |
Not measured |
Relative I/O rates |
— |
Not measured |
— |
|
Example 3: Mail Server Housing 389 Active User Files
The third example comes from a mail server operated by IBM Global Services. This server is slightly larger than the mail server in the first example; it houses 389 active user mail files. It is part of a larger cluster of mail servers, and it is run as an early deployment production server. Table 3 shows the results of implementing DAOS on this server.
| Table 3 |
Results for a mail server housing 389 active user files (AIX – 5.3.7.7 64-bit) |
| |
Before DAOS |
After DAOS |
Percentage Change |
| |
|
|
|
NSF data in data directory |
70.9GB |
40.3GB |
43% reduction |
NLO data in DAOS repository |
— |
15.4GB |
— |
Total data |
70.9 GB |
55.7GB |
21.5% reduction |
Average I/O volume |
— |
— |
12.9% reduction |
Average I/0 per second |
— |
— |
16% reduction |
Average CPU busy |
— |
— |
10.5% reduction |
Relative I/O rates |
— |
Not measured |
— |
|
Example 4: Mail Archive Server
The final DAOS deployment that you’ll look at comes from an IBM business partner company. The target server houses mail archive files. Older documents migrate to the mail archive server from the company’s primary mail servers, which store the active mail files. This company’s data retention policy is to keep everything, so although the direct user activity is low, the volume of data involved is fairly large and disk costs have been increasing annually. Table 4 shows before and after results for the DOAS deployment on a sample of 89 mail archives.
| Table 4 |
Results for a mail archive server (OS/400 – V5R4M0) |
| |
Before DAOS |
After DAOS |
Percentage Change |
| |
|
|
|
NSF data in data directory |
87.3GB |
16.9GB |
80% reduction |
NLO data in DAOS repository |
— |
34.6GB |
— |
Total data |
87.3GB |
51.5GB |
41% reduction |
Average I/O volume |
— |
— |
Not measured |
Average I/0 per second |
— |
— |
Not measured |
Average CPU busy |
— |
— |
Not measured |
Relative I/O rates |
— |
Not measured |
— |
|
Results Summary and Savings Analysis
Figure 2 summarizes the results of deploying DAOS in the four example systems. It’s clear at a glance that no matter what the scenario, there is a savings in disk space: a 12% — 41% reduction in the total disk footprint for the data. A smaller disk footprint means reduced hardware requirements to support the same user load or the ability to support a larger user load with the same storage hardware.
| Figure 2 |
Summary of Domino disk use in all four examples |
|
The reduced data footprint is just part of the story, however. Less obvious, but potentially more valuable, is the 43% – 80% reduction in the data directory footprint (NSF size in the graph). The data directory stores the active data, so it’s where the majority of data activity occurs. The access to this data needs to be fast — typically by means of top-quality storage hardware. A reduction in the data directory footprint means that you need less of this prime data storage. Keep in mind that you need to back up the data in this directory completely every cycle because of its volatile nature. A smaller footprint here translates directly to smaller and faster backup processing.
Although the data directory has highly dynamic data, the DAOS repository is relatively static. Compared with the traffic volume to the data directory, the traffic to the DAOS repository is very low (1.3% on the mail server in the example 1 deployment; see Table 1). Thus, you can employ slower, cheaper storage for the repository. There is also a potential for big savings in back-up processing. After DAOS writes a file, it does not modify the file again until the file is no longer needed; DAOS then deletes it. That behavior makes DAOS a perfect candidate for an incremental back-up method. Because you only need to process the files created since the last back-up cycle, you back up only a small fraction of the total footprint every cycle. The result is a reduction in network load and I/O to the backup storage.
I/O traffic to the data directory is another area where savings can be realized. Consider the case of a document delivered by the router to 10 recipient mail files. Without DAOS, Domino writes the attachment contents to mail.box and then copies them to each of the destination mail files. That works out to 11 writes of the entire attachment. With DAOS, Domino stores the attachment data in the DAOS repository as the attachment first enters mail.box. After that, Domino copies only the reference to the destination files. The cost with DAOS is 1 write, which represents a 91% reduction in the I/O for writing the attachment data.
What Can You Expect in Your Environment?
The results from actual deployments of DAOS clearly indicate that DAOS can provide a great savings in overall disk space and in other aspects of server operation. It can reduce hardware costs by allowing static and dynamic data to reside on different types of storage, and it cuts the resources needed for backup processing by drastically reducing the volume of data that needs to be processed.
The environment that benefits most from DAOS is one in which many large attachments appear in many documents in many NSF files. Conversely, environments that have few or no attachments, or attachments that are very small, see fewer benefits. Naturally, your results will depend on your particular mix of data and other environmental factors. Most likely, they will vary to some extent from the results I’ve presented here.
There is a tool that can give you a very good approximation of how DAOS will affect the data footprint in your environment. The DAOS Estimator analyzes your data and predicts the savings that DAOS would provide. It runs against a Domino 6.x, 7.x, or 8.x server and is available for various platforms. The DAOS Estimator is available for download from IBM.
Found this article helpful? Try these.
|
|
Patrick Mancuso is a principal architect and developer of the DAOS feature. He began his career with IBM in 1984. At Lotus, he first worked on several Lotus 1-2-3 projects, followed by projects involving NotesPump/Lotus Enterprise Integrator, Domino Enterprise Connection Services, Domino/DB2, and Domino internals. Patrick is a graduate of Pennsylvania State University with a degree in computer science. He lives in New Hampshire with his wife, two teenage children, and two German Shepherd dogs. In addition to being a self-admitted computer geek, he is also a muscle-car fanatic and enjoys snowshoeing, hiking, kayaking, and camping. You may contact Patrick here. |
This free article is part of THE VIEW Free Trial for Lotus Professionals program. Subscribe here to receive many more articles like this one, plus new ones published every week at THE VIEW online knowledgebase.
This document is for your personal use only. Reproduction or distribution in any form is strictly prohibited without the permission of the publisher, Wellesley Information Services. For information about the THE UC VIEW, THE VIEW, and other WIS publications, visit www.WISpubs.com.
|