Document Management System Interoperability
The
Need, The Answer
A White Paper for Federal Agency CIOs and IT
Architects
February 1998
Document repositories - their design and operation - will
become a CIO Critical Success Factor over the next several years. Their
importance in information technology (IT) architectures will be
enormous. While the looming Year 2000 demands may overshadow immediate
ventures into this relatively new IT area, agency planners who wait
until after the Year 2000 to give careful attention to them do so at
great risk. This paper was written to help agency officials and planners
prepare, regardless when they start writing system specifications.
The immediate impetus for this paper was the recent
adoption by the industry-wide Document Management Alliance of a "DMA 1.0
Specification," the industry's first standard enabling document
management systems from different vendors to interoperate. The paper
covers document management systems generally and their role in
architecture, and why the DMA accomplishment is so important.
What are (Electronic) Document Management
Systems?
EDMSs, or Document Management Systems (DMSs) as they're
also called - the terms are used interchangeably, are
commercial-off-the-shelf (COTS) software package cousins to DBMSs.
Whereas DBMSs are designed to store structured data - principally data
elements and data records - relieving application programmers of data
storage and retrieval tasks, DMSs store, retrieve and manage
unstructured information objects - files, text, spreadsheets, images,
sound clips, multi-media, and compound documents - giving non-IT
end-users rich storage and retrieval of all of them without anyone in
the end-user's organization having to do any programming. An IT staff's
involvement is to configure and install the package, train the users,
provide help-desk support, and keep the servers running smoothly. In
this sense, DMSs are more like word processing and email
packages.
What are some of these DMSs?
Like their DBMS cousins that range from Microsoft's Access
to Oracle's mainframe products, DMSs range widely in richness and
scalability. They're all proprietary and don't interoperate. Some of the
current players are Documentum, FileNet, Altris, Lotus, Interleaf,
NovaSoft, Open Text, and PC Docs. It's a very competitive marketplace,
with new entries, mergers and acquisitions every year, and it's growing
by leaps and bounds as business processes unshackle from paper.
Who's using them, and why haven't I heard more about
them?
They're already well-installed outside the Beltway across
the manufacturing, commercial, and financial spectrum in the
pharmaceutical, chemical, automotive, financial, insurance, and
aerospace industries, plus many other private sector communities.
Fortune 500 enterprises are recognizing that DMSs have an important role
to play in IT architectures. Their use in the public sector has
progressed more slowly.
Why has public sector adoption been slower?
Three reasons. First, one of the big drivers in the
private sector is reducing time-to-market and development costs for new
products and services. The entire product life-cycle, and especially its
creation and introduction processes, is very document/object intensive,
involving written proposals and communications, drawings and
specifications, laboratory and prototyping reports, market surveys,
financial analyses, problem analyses, performance evaluations, etc.
Effective management of all of these in electronic form rather than on
paper helps compress the product development cycle and lower all product
life-cycle costs. Until recently, the public sector hasn't shared these
pressures.
Second, because they replace the current way
end-users are accustomed to keeping and managing their computer files,
DMS installations mean a big change for end-users. Using a DMS involves
at least some business process reengineering together with lots of
initial training and hand-holding while the users get accustomed to the
new way that their files are stored and retrieved. DMSs are a major
cultural change in how information is handled, and have a big impact on
worker training and adjustment.
Third, the impending Year 2000 wave doesn't favor
rocking the applications boat for either public or private sectors, but
competitive pressures don't give the private sector breathing space for
slowing its drive away from paper. DMS implementation in the public
sector will probably continue to go slowly until the Year 2000 storm has
been safely weathered.
Why are DMSs important to me?
They mean big returns on investment and big gains in
service to citizens and customers because they're the key to changing
most of today's paper-based processes to electronic. The dollar and
performance improvement benefits can be huge. It's often said that
around 90 percent of what the government does is on paper. However, when
the paper gives way to electronics, as it is doing with e-mail,
voice-mail, maps and drawings, instruction manuals, policy formulations,
applications for benefits, and regulatory filings, to mention only some,
those electronic objects must be stored and retrieved in a well-managed
way. If not, the CIO, as accountable official, may face personally a
very uncomfortable situation. Already, we're seeing the tip of the
iceberg in the recent court cases on e-mail and word processing files.
DMSs are the only way to set up and operate well-managed electronic
filing cabinets that meet legal requirements.
Does that mean that these capabilities will help solve my
recordkeeping and FOIA problems?
Absolutely! In the final analysis, they are the only
long-term answer for electronic recordkeeping. Everything else is a
temporary fix. There are several relatively inexpensive COTS software
packages designed specifically for records management and archiving, and
to support FOIA requests to their contents. However, none provide for
getting the records into their data bases. That's where everything
breaks down. Therefore, they have done the only thing possible; they
have tied their packages to one or more DMS products.
By tying records management and archiving to DMSs, an
organization takes care of one end - the back end - of the records "life
cycle." What remains is the other end, the front end, and the process of
getting into the DMS what will later need to be preserved. That includes
e-mail messages, word processing documents, and even voice-mail
messages. A complete case file needs to include all these, and more, so
the commercial DMS products are concentrating on the front-end
integrations to make these captures seamless and transparent. The
front-end marriages with office suites, e-mail systems, etc., are one of
the characteristics that vary among the DMS products. In fact, for many
organizations, the front-end integration is the most important
consideration in DMS product selection.
In connection with recordkeeping, we find a mundane but
surprisingly large nugget of gold - big enough in some cases to pay for
much of the costs of going into DMSs and records management products.
That nugget is the dollars involved in getting rid of file cabinets,
file rooms, their associated floor space and their associated clerical
staffing. The dollar savings may even extend to off-site storage of
records that are retired but not yet destroyed.
Another big nugget will come in faster, more reliable
processing of FOIA requests, and in something that isn't obvious because
we're so accustomed to paper, namely the ability of many people in many
locations to be looking at the same file and records, at the same time.
When active records are in DMSs and preserved records are in the
DMS-associated recordkeeping systems, all the contents of both are
available to all authorized parties 24 hours a day, seven days a week,
every week of the year.
There is a caveat, however. Unless an agency activity goes
100 percent electronic it must continue to deal with incoming papers
that will stay papers. This leads to mixed-media files, and they pose
their own records management challenges - challenges that DMSs can help
to meet, particularly when used in combination with records management
products.
Can DMSs help my document exchange problems?
Document exchange has been largely a "push" activity, in
the form of e-mail "attachments" or computer fax. This has been a
terrific headache for many people trying to exchange word processing
documents across agencies, so that the documents can be read and also
entered into receivers' word processing systems for further use or
revision.
DMSs enable the exchange of documents with a "pull"
approach. In this approach, documents are put into the originator's DMS
in multiple forms (called "renditions") and/or an
Internet-standard-tagged form (probably the new XML), and the intended
receivers or users are notified of the document availability, probably
by a simple e-mail message saying, "Come and get it." Each user can then
select the rendition that works for that user. Final documents might be
in a rendition that preserves page appearance, images might be in one or
more standard encodings, and textual documents undergoing revision might
be in both a proprietary word processing package rendition and one that
is a non-proprietary standard. Compound documents might include all of
these, plus embedded spreadsheets, etc.
Certainly, there are many aspects to the operational use
of this pull approach, especially dealing with communications and
access. It may sound a bit unreal to business-people not accustomed to
heavy Internet use, but it's old hat to scientists, engineers, and
others who have already been using the "ftp" (file transfer protocol)
Internet capability for at least a decade. As those users know, a file
is put into an FTP Server and its specific address is given to the
intended users who then can download it whenever they want. Each version
of the file can have its own unique address.
The DMS-based pull approach will probably find its initial
use in groupware environments, intranets, and extranets. Its broader use
will be accelerated by the DMA interoperability standard which envisions
being able to directly access a specific document with its unique
address, in a very similar way to the World Wide Web's manner of
accessing individual documents on Web servers.
Microsoft's product strategy includes building DMS
features into coming NT releases. Will that do the job?
Microsoft targets the mass market, and we're seeing
initial client-side appearances in the "Outlook" product, and also some
document indexing and searching services in Web site products. What's
anticipated in NT are modest server-side ("BackOffice") DMS functions at
a modest price increment, suitable for a global server market that
includes small businesses. Don't be surprised to see these functions
tied to Microsoft's messaging, groupware and workflow capabilities.
Large organizations will want much richer DMS functionality,
flexibility, scalability, etc., for organization-wide use, and will be
willing to pay the incremental cost. That's where the other COTS DMS
products will find their niche for many years to come.
What's an example of where I'd want the richer DMS
capabilities?
Branch versioning of compound documents is one easily
understood through an example. An agency's investigator at Headquarters
in Washington is leading a case effort with an investigator in the
Chicago Regional office, in cooperation with an Illinois state
government investigator. As part of the wrap-up, the Headquarters lead
investigator drafts a final summary and report. It contains text, of
course, plus photographs, drawings, images of bank checks, data tables,
and embedded URLs to interviews and intercept transcripts that are
stored on a secure intranet host in the agency. It's a "compound"
document because the embedded objects - images, spreadsheets, etc., are
themselves separate documents (objects) in the DMS. The draft report and
all of its contents are confidential, highly sensitive, and will be
subjected to challenge and cross-examination at trial.
The lead investigator sends the draft final report
simultaneously to the regional and state investigators and invites their
corrections. (That's the branching.) They review it independently and
concurrently, with the regional investigator adding another embedded
object while the state investigator notes a correction to a different
embedded object. They send their recommended revisions back to the
Headquarters lead investigator, who first "checks them in" (records them
in the DMS) on their separate branches, and then melds them into a final
consolidated version. The Headquarters DMS not only keeps track of all
this, including the security aspects, but provides also the
record-keeping environment needed in the legal arena by keeping the
various versions, by recording the case audit trail, and by guaranteeing
archival integrity.
What are the interoperability considerations here?
Headquarters, in this example, is where the final case
files are brought together and maintained. It needs lots of
sophistication and scalability. The Chicago regional office has only a
LAN with a couple modest servers, and doesn't need all the bells,
whistles and horsepower of the headquarters system. The Illinois
environment is the responsibility of that state's IRM executives,
independent of Washington. It's integrated with the overall IT
architecture of the state, which is affected by such applications as
taxes, driving licenses, roads, and health care reimbursements. Thus all
three repositories in this example could be on different platforms, and
all three could be using different DMSs, from different vendors. Yet,
the cooperative mission activities require their interoperability. The
interoperability needed is at least two-way between headquarters and
region, and probably three-way to include the state case file. Each is a
client to the others and each one's repository is a server to the
others.
So what's the answer?
The DMS vendor community knows that interoperability is
likely to affect the future success of many products, including such
related ones as imaging, e-mail, voice-mail, groupware, workflow, and
even printing. Industry knows that repositories - where and how things
are stored and managed - will be the nexus of all of these. All know
that interoperability comes only with standards.
ODMA
A few years ago, document management standards efforts
were started at two levels. One was focused on a simple application
programming interface (API) to let any kind of client interact with a
DMS that also implemented the API, for the purpose of storing and
retrieving files. Desktop applications like word processing and
spreadsheet packages are on those clients and must interact with the DMS
to store and retrieve the files created with those packages. In that
sense, the DMS replaces the Windows file system/directory. With this
standard, the client must know the specific design, construction,
capabilities, etc., of the DMS in order to use it, including its
proprietary document structuring, indexing, and query facility. Because
all this knowledge is inside the client, the API itself is simple and
inexpensive, yet so valuable because it makes the power of a proprietary
DMS available to a wide range of desktop applications.
This API standard, called ODMA (for Open Document
Management API) has been built into many different kinds of clients, and
is used widely today. It can be viewed as a many-to-one standard, for
many different clients to interact with each proprietary DMS in each
DMS's own proprietary way. Because each client must be intimately
knowledgeable in advance of each DMS with which it will interact, it
does a portion of the interoperability job needed by our example, but
falls far short of the whole job.
DMA
In parallel, a second, more ambitious standards effort was
launched to create interoperability across different proprietary DMSs
regardless of the platforms on which they reside and regardless of the
networks in which they exist, and without requiring clients to have
advance intimate DMS knowledge. The goal is to have uniform access to
any document stored in any format, anywhere, at any time. This standard
can be combined with the ODMA standard for inexpensive universal client
access, and adds what's needed for completely vendor-independent
cross-repository interoperability. It's called DMA, for the Document
Management Alliance that's creating it, and is a middleware
specification for what is truly many-to-many interoperability. That's
many clients to many DMSs, regardless of platforms and networks. Because
it accommodates international multi-language conventions, it's even
language-independent.
Needless to say, the DMA effort is ambitious and
sophisticated, because it means that any conforming client, including
Web clients, can interact with any conforming DMS without having to know
in advance the specific commands and characteristics of each DMS. It
enables a client to use its own user interface and command set (look and
feel) to store and retrieve objects from different-vendor DMSs, and to
discover DMS characteristics when a request is first sent. There are
specifications for objects, querying, versioning, containment,
check-in/check-out, compound document support, content-based searching,
and other aspects of repository management. Most of these are in the DMA
1.0 specification which was formally approved in December 1997. (None of
these are included in the ODMA specification.) The priorities for the
DMA 2.0 and later levels of the specification will be determined by
ongoing user feedback.
Can you give me an analogy to clarify this?
Think of people, and today's file rooms or records centers
that store lots of cabinets containing lots of different files holding
tons of paper. The people coming into the rooms, either to store or
retrieve, are the clients. The ODMA specification creates a big,
well-lit door and entranceway through which anyone can enter, whether on
foot, crutches or in a wheelchair, and regardless of gender, race, or
nationality. However, to use the file room, ODMA expects that each will
know before-hand the file scheme, the rules of the file room, and be
able to read and understand the cabinet and file labeling. Each file
room is unique, and ODMA expects each user to understand its uniqueness.
Because it's an entranceway specification, the ODMA spec is simple and
basic.
The DMA specification lets all those unique file rooms be
used without requiring advance knowledge of each room, or even the
ability to understand the language in which the labels are written. If
three different state agencies were give an access to a Federal
investigator, the DMA specification lets them say, "We three State
agents are going to let a particular Federal agent (to whom we've given
permission) use information in our three different States' file rooms
without the agent having to know in advance how the contents are
organized or labeled in the different file rooms, or even the procedures
set up in the rooms." The Federal agent can use a single client computer
program - either the Federal agency's proprietary client or an Internet
browser - to use all the file rooms simultaneously, despite the
differences among the rooms in their organization, labeling, and
procedures. In effect, DMA lets the different file rooms look and
operate the same way to the Federal agent despite their underlying
differences.
Now that's interoperability! One can imagine its power in
regulatory activities wherein a government regulatory agency would be
able to access regulatees' documents while letting each have the freedom
to architect its own document management environment. (Speaking of
architecture, all the Federal Government agency IT architecture
documents could be made directly accessible to all Federal agency CIOs
despite the agencies operating on different platforms, with different
office application packages, and using different DMS COTS products.)
That's the promise of DMA.
Where do the Internet protocols and technologies play in
all this?
The Internet explosion has contributed significantly to
the rapid deployment of DMSs in the private sector. First, Internet
technology has reduced the cost of deploying DMSs by enabling the use of
Internet browsers instead of proprietary client software for end-user
desktops, for organizations whose needs can be met in this way. Today,
many DMS vendors offer support for both proprietary clients on private
networks and also Internet connectivity through Web gateways.
Second, as organizations put documents on their intranets,
it spotlights the need for internal control processes for managing the
change status, recordkeeping, obsolescence, and disposal of the
documents. Putting documents on the Internet's Web magnifies these
concerns even more. DMSs are the way to manage document life-cycles,
with or without internets, and the introduction of enterprise intranets
spurs the need for the interoperability standards to bridge the systems
with the intranet sites as well as with one another. Fortunately, the
DMA specification meets this need for intranet/Internet host-DMS
bridging.
In a related vein, the Internet engineering community is
pursuing a method (WebDAV) by which Web pages created by one person in
one location with one authoring tool could be revised by different
persons in different locations using different authoring tools. Because
the method includes "checking out" a Web page and tracking versions of
Web pages, some have wondered whether this competes or conflicts with
the DMA specification. Actually, the two are extremely complementary,
and a collaboration has been established between the two communities of
engineers to ensure that the standards align with one another so that
users can benefit from them both, together.
Why be concerned with the DMA effort now, when I won't be
buying products until I'm over the Year 2000 hurdle?
A very big reason for endorsing the DMA effort now is to
accelerate benefits in the related application areas of groupware and
workflow. Both of these are strong, getting stronger, and expected to
play major roles in IT architectures. The Workflow Management Coalition
of vendors and users has developed a standard reference model to be
fleshed out, and along the way it will need to address the repository
aspects of where workflow objects and processes are stored and
retrieved. There will be powerful architectural benefits to user
enterprises if those repositories conform to the DMA specification.
Similarly, the adoption of the DMA interoperability specification will
foster interoperability with groupware products. And as noted above, it
also can foster interoperability with records management and archiving
products. For all of these, the DMA agency architectural benefits will
be powerful.
So the bottom line isn't just interoperability among DMSs,
but also interoperability with groupware, workflow, imaging, printing,
messaging, records management, and many other information handling
products. If it's going to really work, that interoperability must begin
where the information objects are stored and managed. The DMA 1.0
Specification is that beginning. Users can support it now in new system
and architecture plans, to be reflected subsequently in RFP
specifications.
How can I be sure that conforming products will be
available to me when I want them?
Because vendors will only build what the market wants, if
government CIOs want it in 2001 they must start now to send the message
to the vendor community. The message has to be that their agencies will
be requiring the specification in procurement actions, for product
deliveries beginning in 1999, and that they anticipate communicating
with the DMA to prioritize expanded functions for the next levels of the
DMA specification.
Agencies that have established relationships with affected
vendors can do as several private sector users are doing, namely
informing their vendors that implementation of the specification will be
required in future versions, releases, upgrades, etc., of their
products. Agencies that are developing architectures probably will be
identifying several interoperability specifications they expect to be
seeking in their future product acquisitions, and the DMA specification
can appear on their lists. Agencies that are conducting research can ask
about availability of the DMA specification in any Requests for
information that they issue.
Regardless where an agency stands for procurement of
conforming products, it can express its needs and desires within the DMA
as a user member. Participation in the DMA can be a way to influence
both the delivery of conforming products and the enhancements made in
future releases of the specification. The DMA vendors have already
identified several candidate additions and improvements but cannot do
them all at once. Users and marketplace feedback will set the
priorities.
For more information about the DMA, including membership
and technical materials on the 1.0 specification, see http://www.aiim.org/dma.
This paper was authored by Dan Schneider, U.S. Department
of Justice, 202-514-4318, schneidd@justice.usdoj.gov.
The Department of Justice is a DMA user organization member. Mr
Schneider welcomes reader questions or comments.