Chapter 2

Databases and the World Wide Web

 

 

 

Previous

Table of Contents

Next

 

2.0 Introduction

Information plays a vital role for all walks of life in today’s technologically advanced society.  Tananbaum (1996) verifies this and states that the main technologies developed during the twentieth century have been methods of gathering, processing and distributing this information.  Databases have played an important role in this information revolution providing a storage location from which data can be retrieved and a mechanism for manipulating it (Mohseni, 1996).

 

Databases are commonly used and relied upon by organisations and institutions alike world wide (Ball et al., 1996; Lynch, 1997).  Their most important role is in facilitating the development of computer applications (Lotus, 1996) by providing a means of data persistence (Mohmoud, 1996).  But developing database applications is not without it’s problems.  Different vendors provide different database management systems (DBMS) which traditionally required developers to write an application for each DBMS.  As a result users had to learn several packages to enable them to access all of the information contained within a company’s databases (Visigenic, 1996).  To further complicate matters each DBMS can operate on different platforms (e.g. Windows 95, UNIX, and mainframe computers) and companies can use different protocols when networking their computers.  Providing a solution to overcome these difficulties is not an easy task.  What is required is a means for developing a single application that will allow the user to access all of the organisation’s databases, homogenous or heterogeneous, regardless of database type, platform, or networking protocol implemented from a single location. 

 

The Web and its enabling technologies can be used to provide a potential solution for the scenario mentioned above.  And as we will see a complimentary relationship exists between databases and the Web.  The Web provides access to information on a global scale (Orfali et al., 1996).  This means that databases connected to the Web can be accessed from any location in the world (Dataramp, 1996) allowing the contents of a database to be shared world wide (Mohseni, 1996).  This ability to connect to databases via the Web enabled the Web to be used as an interactive application development platform (Linthicum, 1996), providing persistent data storage for Web applications, as opposed to a medium for static publishing.  This fact was realised by many corporations as they tried to establish a Web presence for business purposes.  Ball et al. (1996) verifies this;

“As organisations attempt to realise the potential of  the Internet/Intranet architecture, they are almost immediately confronted with the harsh reality that to build anything beyond trivial, static Web pages, they require access to enterprise databases.”

This relationship brings many benefits to corporations (Lynch, 1997), who can either use Web technologies to provide easy access to all of their internal information via a private Intranet, or by connecting their databases to the Web allowing them to be accessed globally and to enable the development of interactive applications for use across the Internet.

 

During the rest of this chapter the architecture of database applications is examined and the Web technologies that enable the solutions mentioned above to be achieved are considered in this context.

 

2.1 The Databases Client/Server model

For databases to be beneficial to organisations the information contained within them must be shared.  This is achieved by networking the computers and by providing access to the databases, typically using a client/server approach.  Client/Server computing has its origins with the advent of the personal computer (PC).  As PCs grew in popularity various methods of networking them emerged.  Initially this enabled resources to be shared and then processing to be distributed among the networked computers (Williamson & Moran, 1997).  This approach caused a shift away from the monolithic mainframe database systems, as correctly developed client/server database systems proved to be cheaper to develop, deploy and maintain. They also improved development time and increased user satisfaction (Linthicum, 1997).  Client/server computing offers greater flexibility than other preceding computer architectures resulting in it’s growing popularity.  Fiste et al. (1996) verifies this;

            "80% of enterprises are using some form of client/server in their application development.  By 1998 this number is expected to exceed 90%"

In client/server and other distributed processing paradigms, the idea is to increase processing capacity through process distribution rather than increasing the size of the central processor.  This is achieved by sharing the processing load among networked computers (Linthicum, 1997).  Each of the networked computers carries out a given task and co-ordinates with other related tasks across a network.  Simple client/server architectures consist of three main components, the client, the server and the middleware (Orfali et al., 1996).  The processing load is split between the client and the server applications while the middleware enables the client and the server applications to communicate (Tucker, 1997).

 

2.1.1 Components of a client/server database application

The client application performs the user interface tasks, normally via a graphical user interface.  It enables the user to enter data, request information and to perform functions.  When the user enters data, the user interface generates a request for the database server, normally in the form of SQL.  It then opens a connection and transmits the request.  The server listens for client requests, accepts a request, performs the necessary computations on the database and returns only the requested information to the client, see fig 2.1.  The client then processes the returned information and presents the results to the user in the desired format, or performs some other necessary task with it (Linthicum, 1997). 

 

 

Dataramp (1996) divides networked database applications into three parts, the user interface, the business logic and the shared data source, see fig 2.2.  The business logic contains the rules specifically designed for a business application.  It contains all the necessary code to process user requests and performs functions to validate, transform and request data.  In a simple client/server application the business logic can reside within the client, the server or both (Orfali et al., 1996), we shall see further partitioning in the next section.  Together the user interface and the business logic make up the application logic.  The shared data source is normally in the form of a database management system (DBMS), which is responsible for managing the data and enabling it to be  updated, shared and retrieved.  A database server sits between the client and the database, it provides secure access to the shared data and manages the control and execution of SQL commands (Mohseni, 1996).  The database server is of vital importance when a large number of users avail of a particular resource, as it manages recovery, concurrency and the consistency aspects of the system (Orfali et al., 1996).

           

 

When a database application is deployed over a network e.g. a LAN or the Internet, it must be partitioned into two or more parts or tiers (Dataramp, 1996).  There are many variations of multi-tier architectures depending on how you divide the application and the middleware you use to communicate between the tiers (Orfali et al., 1996).

 

2.1.2 Two Tier Architecture

The Traditional client/server database architecture implements a two-tier approach (Symantec, 1996).  The application logic exists in either the user interface or within the database server or both (Orfali et al., 1996).  The location of the partitioned processes gave rise to terms such as “fat client” and “fat server” depending on the location of the bulk of the application processing  (Orfali et al., 1996).  Most client/server systems implement the fat client approach (Fiste et al., 1996).

 

In the fat client approach the bulk of the code resides on the client, resulting in several problems.  Most software vendors charge on a per client basis, resulting in a license being purchased for each client (Symantec, 1996).  Whenever the system is upgraded or maintenance is performed every client must be accessed (Fiste et al., 1996).  Finally the fat client approach does not scale well making it suitable only for small-scale systems (Linthicum, 1996).

 

In the fat server approach the bulk of the code resides on the server, making it easier to maintain and upgrade (Fiste et al., 1996).  The limiting factor of this approach is that several users concurrently accessing a server can cause it to thrash, resulting in reduced performance or worse still the server crashing (Linthicum, 1997).

 

Regardless of the version of two-tier implemented the system can face scalability, performance and flexibility problems (Dataramp, 1996).  For larger systems it is better to adopt a three-tier or multi-tier approach.

 

2.1.3 Three Tier Architecture

In response to the drawbacks of the two-tiered network database architecture a three-tiered or multi-tiered approach can be adopted (Symantec, 1996).  This approach uses an application server between the clients and the back end database server, see fig 2.3.  The application server contains the bulk of the application logic (Donkin, 1996), with the remainder contained on the client (Symantec, 1996).  It accepts requests from the clients, obtains the desired data from the data sources, processes the results and sends them back to the client.

 

There are many benefits to three-tier systems, they are more scalable and easier to control (Ball et al., 1995).  The middle tier enables the system to handle more client connections (Symantec, 1996), implement better security and provide easier maintenance.  Connectivity to heterogeneous data sources is greatly simplified as the required database drivers are contained at a single location, requiring fewer client licenses (Dataramp, 1996).  It is with three tier and multi-tier architectures that the future of client/server database applications lie, according to Orfali et al. (1997) Metagroup predicts that true database independence will not be possible without three tier architectures.  Gartner Group (1995) agrees about their importance and predicts that three tier client/server applications will increase from 3% of all client server applications in 1995 to 33% in 1998.

 

 

Client/server computing definitely has an important role in today’s database application development paradigms, bringing many benefits including those of distributed computing and shared resources.  Examples of client/server computing can be seen in most institutions and organisations but the biggest example of client/server is the Internet (Orfali et al., 1996).  According to Desper et al. (1995) the Internet is another contributing factor to the popular growth of client/server computing, and that companies wish to reap the benefits offered by both client/server computing and the Internet.

 

2.2 The World Wide Web and Databases

The World Wide Web is a network of inter-linked documents, which covers virtually every location on earth (Mohseni, 1996).  The Web provides an enormous amount of  potential functionality for all who wish to avail of it’s services.  This potential is the main reason for its recent surging popularity (Teleen, 1996).  According to Gates (1997) this explosive growth of the Internet is the single most important development in the computer industry since the IBM PC was introduced in 1981. 

 

The architecture of the Web conforms to the client/server model (Gleeson, 1996) and  consists of two main components, a Web browser (e.g. Netscape’s Navigator) and a Web server, see fig 2.4.  Web servers are connected to the Internet and are uniquely identified using an IP address.  The function of the Web server is to find and return resources.  They accept requests from clients for a resource located on the same machine as the server, find it and return it to the client.  HTTP residing on top of TCP/IP is the protocol used for communications between the Web browser and server.  HTTP is a generic object oriented stateless protocol (Varlea et al., 1994) and it provides an efficient method of finding resources using universal resource locators (URL).  URL’s point to resources and provide a consistent intergalactic (Orfali et al., 1996) naming scheme that identifies Web resources including, Web pages, images, sound clips, movies and programs (Orfali & Harkey, 1997).

 

HTTP was designed to be a simple protocol and it carries out a single action at a time.  This makes it easy to implement but very inefficient as it must create a separate TCP/IP connection for each request/response session.  For example if a page is requested which contains five images, six connections are required, one for the page and one each for every image.  Due to the HTTP’s simple nature and being stateless it provides a poor backbone for developing Web based applications.

 

This model of the Web is only suitable for static publishing (Ball et al., 1996).  It is here that databases play an important role.  As the most important use of databases is in developing data persistent applications (Lotus, 1996), databases can also be used to provide storage for Web applications.  It was the ability to connect databases to the Web that enabled the Web to be used as an application development platform (Linthicum, 1996).

 

The Web is based on open standards making the model platform independent, this enables the client to operate on different operating systems (Mohseni, 1996).  Orfali et al. (1996) agrees and describes the Web as the first “truly intergalactic client/server application”, this is due to the span of the Internet and the Web’s platform independent features (Mohseni, 1996).  This makes applications deployed across the Internet unique as they can operate on any platform and can be accessed globally.  It is a new application development paradigm which implements both two-tier and three-tier client/server architectures (Linthicum, 1997).  Varney (1996) agrees and adds;

“Think of the Web as a new kind of application development platform- one that just doesn’t care what’s on the client side of the traditional data access equation”

There are a number of ways the Web can be used to develop interactive data persistent applications and currently most of them include the use of databases.  In the next section the most common technologies that are being used to create database applications for use across the Internet and Intranets are examined.

 

2.3 Database Connectivity Solutions for the Internet

There are several ways a developer can create Web based applications which connect to and use databases, including;

1.     The Common Gateway Interface.

2.     Middleware solutions.

3.     Universal and Web aware databases .

4.     Distributed object frameworks.

5.     ActiveX.

6.     Java and JDBC.

The next sections examines each of the above approaches in turn and concludes by indicating which solutions are the most appropriate for connecting to heterogeneous databases across different platforms.

 

2.3.1 Common Gateway Interface Systems

The Common Gateway Interface (CGI) (NCSA, 1994) is a standard Application Programming Interface (API) which enables a Web server to communicate with external applications (Lotus, 1996, Ball et al., 1996).  Gateway programs which conform to the CGI standard reside on the same machine as the Web server acting as the middleware to the target application, which in our case is a database, residing locally or across a network, see fig2.5 (Mohseni, 1996).  The gateway program can be written using a programming language, scripting language or a platform specific language, examples include C, C + +, Java, Perl & Tcl (Gleeson, 1996).

 

 

CGI is currently the main model for deploying applications across the Internet (Orfali & Harkey, 1997).  The main reason for this is because the CGI standard is supported by virtually every Web server and browser, making it possible to bring platform independent interactive applications to anyone with a Web browser (Flynn, 1996).  HTML forms and CGI scripts were introduced into the Web architecture to enable interactivity and functionality in Web applications.  They are additions to the basic hypertext facilities of the Web (Rees, 1997).  The following steps are carried out when a user requests information from a database via the Internet: 

·      The user enters information into a form or a Web page and clicks on a send button or hyperlink pointing to the CGI script.

·      The Web browser requests the CGI script link specified by a Web page and transmits the user input to the Web server.

·      The Web server passes the information to the CGI script, which contains all the necessary database access code (Symantec, 1996), enabling it to query the database.

·      The database returns the requested information to the CGI script.

·      The CGI script converts the information into HTML and returns it to the browser via the server (Rees, 1997).

 

Even though implementing CGI is the most common method for developing interactive Web based applications there are several drawbacks:

·      The output generated is typically HTML, which is not appropriate when computation or analysis must be performed on the information (Sytmantec, 1996).

·      A new instance of a CGI script is launched for each page that accesses a particular resource.  This can result in a reduction of performance, or the server crashing if the resource is over accessed (Linthicum, 1997).

·      All the processing occurs on the server side, requiring a round trip to provide the user with feedback or input validation (Flynn, 1996).

·      CGI is stateless, necessitating clumsy coding for applications requiring state.

 

There are ways to work around most of the mentioned problems, but the main problem with CGI is the lack of speed, this is due to the reliance on HTTP (Orfali et al., 1996).  This makes HTTP/CGI unsuitable for developing many time critical applications (Rees, 1997).  Orfali & Harkey (1997) agree and found HTTP/CGI about 200 times slower than the CORBA standard.  Because of the shortcomings of this method of application development the major Web companies are researching new replacement backbones for the Web (e.g. Netscape’s NSAPI and Microsoft’s ISAPI).  According to Orfali & Harkey (1997), Netscape’s Andreessen predicts that CORBA/IIOP will replace HTTP/CGI as the backbone for the Internet in the future.

 

2.3.2 Middleware Solutions

Middleware plays an important role in developing client/server applications (Freeman, 1997).  It is the enabling technology behind client/server computing (Tristram, 1996) and the term used to cover all the distributed software needed to support interactions between clients and servers.  Invisibly to the user it deals with multiple platforms, differing databases and file systems, networking protocols and legacy applications (Tucker, 1997; Schreiber, 1996).  Orfali et al. (1996) agrees and describes it as the “/” in client/server computing.

 

Middleware starts with the Application Programming Interface (API) set on the client that is used to invoke a service, and it covers the transmission of the request over the network and the resulting response.  Middleware does not include the software that provides the service, the user interface or the application logic, as these are the tasks of the client and the server (Linthicum, 1997).  This is the main advantage of middleware solutions.  The software developer writes code which conforms to the middleware API and can concentrate on the tasks the application is intended to perform (Tristram, 1996).

 

Middleware can be divided into five categories (Tucker, 1997).

1.    Database Middleware.

2.    Remote procedure calls (RPC).

3.    Object Request Brokers (ORB).

4.    Transaction Processing (TP) monitors.

5.    Message oriented Middleware (MOM).

Database Middleware enables a client/server system to communicate with one or more databases (Orfali et al., 1996).  With RPCs, calls are sent to another machine across a distributed environment.  ORBs act on objects, again located within a distributed system. TP monitors are used mainly within financial applications and are used to guarantee the delivery of the transaction using a synchronous approach.  An automated teller machine is an example of the use of a TP monitor.  Finally MOMs are also used to guarantee delivery but uses an asynchronous event driven approach (Freeman, 1997).

 

Middleware not only enables clients to communicate with database servers, but they can enable many systems to interoperate to form virtual systems (Linthicum, 1997).  Woo (1995) agrees and adds

“The World Wide Web can capitalize on this by providing access to distributed system services via this Middleware and access of Web services to distributed systems”.

As a result of this many organisations are using middleware to connect the Web to relational databases, legacy systems, and the rest of the companies IT infrastructure (Tucker, 1997).

 

Middleware software can be hand coded and is quite common for small-scale applications (Freeman, 1997) but for larger systems, commercial middleware is recommended.  Middleware can be obtained for a single purpose, but it is quite common to see an overlap in the services being provided.  Many applications can provide middleware as a secondary function.  For example Amazon from Intelligent Environments (Mohseni, 1996) main purpose is to provide a development environment for Web-based applications, but also provides links to various databases enabling it to be used as a middleware solution.  All the main database vendors are providing middleware to enable their databases to be connected to the Web (Tucker, 1997). 

 

There are quite a number of middleware tools available to enable database connectivity via the Web (Mohseni, 1996).  They provide a number of alternative solutions depending on the vendor and can greatly simplify the process of connecting to and performing database operations.  For example Saphire/Web (Bluestone, 1996) allows users to develop HTML forms, specify the desired actions and the application automatically creates the required CGI code containing the required database access code.

 

Middleware software is very important and brings a lot of benefits, but it is quite often proprietary, difficult to write and costly (Tristram, 1996).  It is clear that a standard easier approach is required when developing database applications to traverse differing operating systems and networking protocols.  Web technologies can provide an effective solution, but as we shall see a solution that is not appropriate for all situations (Linthicum, 1997).

 

Web technologies can be used as a middleware solution as it is based on open standards (Mohseni, 1996).  Organisations can deploy an internal version of the Internet i.e. an Intranet, which operates behind a secure firewall.  Web browsers can be installed on several types of platforms, providing a simple intuitive interface on the client side (Ball et al., 1996).  This eliminates the need for proprietary clients to be written for each platform.  A single application can be written e.g. a Java applet or application to connect to the data source, and ported to the Web browsers on the clients.  Database Middleware (drivers) will still be required to connect the Web server to the database server.  This scenario demonstrates how Web technologies can provide a cheep easy solution for developing and connecting clients on different platforms to databases in situations that are not rigorous or critical (Freeman, 1997; Linthicum, 1997).


2.3.3 Java and JDBC

The Java programming language was developed by Sun MicroSystems (Sun, 1996).  It is a fully-fledged Object Oriented programming language (Campbell & Murtagh, 1997) that changed the face of the Web (Ball et al., 1996).  This is due to the fact that Java programs will run on any platform which supports the Java virtual machine, i.e. Java is platform independent (Idhen & Edwards, 1996).  Java provides two modes of operation;

1.    Stand alone application.

2.    A downloadable Applet.

As a stand alone application Java operates in the same way as any other programming language.  Java Applets can be located on a Web server, requested by a client, downloaded and executed within the client’s Web browser (Mohseni, 1996).  Java applets have strict security restrictions placed on them.  Firstly, to ensure they do not perform any harmful operations on the client machine, the actions, which they are allowed to perform, are strictly limited.  They can not modify data contained on the client.  Secondly applets can only open connections back to the server from which they were originally downloaded.  These restrictions can be relaxed if the source of the originating server is certified as being trustworthy.

 

As the applets are executed within the browser this changes significantly the architecture of the Web (Linthicum, 1996), it enables truly interactive applications to be deployed across the Internet which can implement the fat client approach, removing the burden from the server.  Other advantages of this approach are zero installation and easy maintenance.  Whenever the application needs updating only the copy on the server is changed, the changed version can then be downloaded by the clients.

 

Java is an excellent language for developing database applications (Hamilton et al., 1997).  Java applets and applications can be used to access databases via the Internet. This is achieved using JDBC.  JDBC is a low level API implemented as a set of Java classes.  It enables a Java program to connect to and perform database operations on any database for which the correct drivers exist, either locally or across a network, see

fig 2.6.

 

The combination of the facts that Java is platform independent, an excellent language for the Internet and capable of interacting with databases, makes Java a prime candidate for solving the information needs of organisations who wish to integrate heterogeneous databases that operate on multiple platforms.  This can be achieved by deploying an Intranet and developing Java applets to execute within a Web browser, or as stand-alone applications.

 

2.3.4 ActiveX

ActiveX is a Microsoft pioneered technology that is growing in popularity and is most commonly used across the Internet.  It is an open, cross platform set of technologies for integrating components (Redmond, 1996).  It involves controls, which are different components which slot into complying applications (Mohseni, 1996).  ActiveX is built upon the Component Object Model (COM) and Distributed COM (DCOM).

 

These components are controlled and interact with each other through programming languages.  On Web pages scripting languages, such as JavaScript and Visual Basic scripting edition controls them.  Microsoft provide a wide variety of controls for different purposes but a programmer can create customised ActiveX components using Microsoft’s Visual Basic.

 

When used across the Internet ActiveX controls are stored on a Web server and can be downloaded on request like a Java applet.  Once downloaded controls are saved to the computer’s hard disk and executed within the Web browser.  This means a particular version of a control only has to be downloaded once.  In contrast to this whenever a Java applet is requested it must be downloaded each time.

 

ActiveX enables database connectivity for Internets and Intranets via Microsoft’s Open Database Connectivity (ODBC).  ODBC provides a low-level API similar to Java’s JDBC (Linthicum, 1997).  Unlike Java though, the ActiveX security model is currently not as effective as the Java security model, allowing unlimited access to the client machine.  This makes them very useful but also potentially very dangerous (Tilley, 1997).

ActiveX can enable connectivity to heterogeneous databases via ODBC making it suitable for intranet applications, but due to the fact that ActiveX contains security flaws does not make it ideal for providing an interface to databases existing across the Internet.

 

2.3.5 Web Aware Databases

Each of the main database vendors has produced their own versions of Web aware databases or database aware Web servers (Mohseni, 1996).  These databases can connect directly to a Web server and produce HTML on request.  This enables them to be accessed via a Web browser.  An example of a database aware server is Oracle’s “WebServer” (Oracle, 1996), which is Oracle’s solution to Web and database integration.  It provides an environment for Web application development for either the Internet or an Intranet. 

 

There are numerous examples of databases and Web servers specifically designed for integrating both the Web and databases.  They enable data persistent Web applications to be easily created for either the Internet or an Intranet.  This approach can provide a viable method to satisfy an organisation’s information needs.

 

2.3.6 Distributed Object Systems

Distributed objects are an extension of classical objects, they can inherit attributes and actions and call each other’s methods the same way classical objects can (Reedy, 1995).  The major difference between classical objects and distributed objects is that classical objects are located together and distributed objects can be physically separated and located anywhere across a network.  Even though distributed objects can be physically separated the programmer need not be aware of this (Orfali et al., 1996).  Furthermore distributed objects can be located on different platforms and implemented in different programming languages.  To enable the distributed objects to transparently interact can be a daunting task, but one that has been greatly simplified with the use of Object Request Brokers (ORB) (Orfali & Harkey, 1997).

 

2.3.6.1 Object Request Broker

An ORB is a middleware technology (Linthicum, 1997).  Its function is to manage the communication and data exchange between distributed objects.  ORBs enable interoperability of distributed object systems, as they enable systems to be built by slotting together objects from different vendors that communicate with each other using the ORB (Wade, 1994).  How the ORB is implemented is not important to the distributed systems developer, the developers main concern is how the objects interface together.  This is a form of information hiding, making the system easier to maintain.  This is because the ORB hides all the communication details within itself (Cobb, 1995).

 

The goal of ORB technologies is to enable objects to communicate across various platforms and different programming languages.  ORBs provide the illusion of locality. They enable distributed objects to be accessed the same way a classical object would be (Reedy, 1995).  ORBs allow communications across different platforms and allow objects to hide their implementation details from clients. This gives the following types of transparency; programming language, operating system, host hardware and object location (Wallnau & Foreman, 1997).

 

There are two main types of distributed objects available, which adhere to different specifications (Linthicum, 1997).  The main standard by the Object Management Group (OMG) is the Common Object Request Broker Architecture (CORBA).  The second standard developed by Microsoft is called a Distributed Object Component Model (DCOM).  CORBA is the older and more widely used standard and several major companies support it.  DCOM has the backing of Microsoft.  Orfali & Harkey (1997) compared both standards, and found that CORBA was slightly faster than DCOM and was also easier to implement and more reliable by far.  The Remote Method Invocation standard also exists to enable Java objects to be distributed in the same manner as via CORBA ORBs.

 

2.3.6.2 Common Object Request Broker Architecture

CORBA is a specification developed by the Object Management Group (OMG).  It’s function is to provide a standard architecture for ORBs (OMG, 1996).  This standard specification enables ORB products to be developed which are application portable and interoperable across different ORBs, programming languages, hardware platforms and operating systems (Wallnau, 1997).

 

How CORBA operates is beyond the scope of this project, but the implications of the CORBA standard are important.  CORBA ORBs enable legacy applications to be wrappered and accessed, heterogeneous databases can be connected to via an ORB, leaving the application developer free to concentrate on data manipulation tasks.  Distributed systems, including distributed databases can be easily created in different languages and on different platforms and via an ORB accessed transparently.  Distributed objects and the enabling CORBA ORBs are quite significant and have a large role to play in the future of Internet computing and distributed computing (Orfali et al., 1997).  We shall see in the next section how CORBA can greatly improve many of the issues that have traditionally faced developers when constructing Distributed Databases.

 

2.4 Distributed Databases

The concept behind distributed databases was pioneered in the late 1970’s by IBM.

Webopaedia (1996) defines a distributed database as being;

“A database that consists of two or more data files located at different sites on a computer network. Because the database is distributed, different users can access it without interfering with one another. However, the DBMS must periodically synchronize the scattered databases to make sure that they all have consistent data.”

In this section an overview is given of the major difficulties faced by distributed database systems developers.  Potential solutions are identified regarding how the mentioned Web technologies can help overcome many of them.

 

Whenever a user accesses the data stored at different sites, the system must provide “Location Transparency”, i.e. the user can access the data without being aware of its storage location.  For efficiency purposes the data contained within the system can be replicated at the different sites.  This can create difficulties as more than one copy of the data exists at different locations.  Whenever information is updated the same change must be reflected in all the copies at the various sites.  Again the user of the system must not be aware of this and it is the responsibility of the system to provide the user with “Replication Transparency” (Williamson & Moran, 1997).  Depending on the design of the distributed database the data can be fragmented, or certain categories of information stored only at the locations where they are of most use.  Again the user of the system will not be aware of this fragmentation of data, it is up to the system to provide “Fragmentation Transparency”.  In the same way that heterogeneous databases exist different vendors provide heterogeneous distributed databases, which in the past have proved very difficult for system developers to integrate (Pratt & Adamski, 1994).

 

The main problem areas identified above that a distributed database system must provide a user are location transparency, replication transparency, fragmentation transparency and integration of heterogeneous distributed database systems.  These issues can be overcome using both the Java and CORBA technologies.  Firstly the main aim of CORBA is to provide a developer of a system with transparency regarding object locations, programming languages, operating system and any other aspects that are not part of an object’s interface (OMG, 1996).  This enables a developer to construct new distributed database systems providing the required transparencies based on distributed objects.  For example if a copy of an object resides at more than one location, and a change is made to one object it can be automatically reflected in the others.  Existing heterogeneous database systems can be wrappered and thus ease the process of integration.

 

Java can be used to create distributed systems using Remote Method Invocation (RMI), which is an easier method of working with distributed objects than via CORBA (Reese, 1997).  Together RMI and a combination of Java applets and JDBC can be used to easily create distributed databases providing the required transparencies.

 

2.5 Conclusion

This chapter introduced the difficulties faced by organisations who wished to create a single application that could satisfy all of an organisations information needs when accessing heterogeneous databases, existing on various platforms and across different networking protocols.  Web technologies provide a solution to this problem as they are based on open standards, enabling Web technologies to be used as a middleware solution.  Databases in turn enabled the Web to be used for developing interactive data persistent applications. 

 

Of the various Web database-enabling technologies both Java and CORBA provide viable secure approaches.  Java and JDBC are aimed specifically at providing access to heterogeneous databases.  This approach is especially attractive as Java is platform independent and JDBC provides an open methodology of connecting to databases.  As a result a Java application or applet can be written once, ported to any platform and will access any database for which the necessary JDBC drivers exist.  In chapter three Java and JDBC are examined in greater detail and particular attention is given to developing the frameworks that will facilitate easier Java database application development.

 

Previous

Table of Contents

Next

 

Home