Overview of Database Management System

 Science, business, education, economy, law, culture, all areas of human development “work” with the constant aid of data. Databases play a crucial role within science research: the body of scientific and technical data and information in the public domain is massive and factual data are fundamental to the progress of science. But the progress of science is not the only process affected by the way people use databases. Stock exchange data are absolutely necessary to any analyst; access to comprehensive databases of large scale is an everyday activity of a teacher, an educator, an academic or a lawyer. There are databases collecting all sorts of different data: nuclear structure and radioactive decay data for isotopes (the Evaluated Nuclear Structure Data File) and genes sequences (the Human Genome Database), prisoners’ DNA data (“DNA offender database”), names of people accused for drug offenses, telephone numbers, legal materials and many others. In this chapter, the basic idea about database management system, its evolution, its advantage over conventional file system, database system structure is discussed.Data are raw facts that constitute building block of information. Data are the heart of the DBMS. It is to be noted that all the data will not convey useful information. Useful information is obtained from processed data. In other words, data has to be interpreted in order to obtain information. Good, timely, relevant information is the key to decision making. Good decision making is the key to organizational survival. Data are a representation of facts, concepts, or instructions in a formalized manner suitable for communication, interpretation, or processing by humans or automatic means. The data in DBMS can be broadly classified into two types, one is the collection of information needed by the organization and the other is “metadata” which is the information about the database. The term “metadata” will be discussed in detail later in this chapter. Data are the most stable part of an organization’s information system. A company needs to save information about employees, departments, and salaries. These pieces of information are called data. Permanent storage of data are referred to as persistent data. Generally, we perform operations on data or data items to supply some information about an entity. For example library keeps a list of members, books, due dates, and fines. 1.3 Database A database is a well-organized collection of data that are related in a meaningful way, which can be accessed in different logical orders. Database systems are systems in which the interpretation and storage of information are of primary importance. The database should contain all the data needed by the organization as a result, a huge volume of data, the need for long-term storage of the data, and access of the data by a large number of users generally characterize database systems.

Database Management System A database management system (DBMS) consists of collection of interrelated data and a set of programs to access that data. It is software that is helpful in maintaining and utilizing a database. A DBMS consists of: – A collection of interrelated and persistent data. This part of DBMS is referred to as database (DB). – A set of application programs used to access, update, and manage data. This part constitutes data management system (MS). – A DBMS is general-purpose software i.e., not application specific. The same DBMS (e.g., Oracle, Sybase, etc.) can be used in railway reservation system, library management, university, etc. – A DBMS takes care of storing and accessing data, leaving only application specific tasks to application programs. DBMS is a complex system that allows a user to do many things to data as shown in Fig. 1.2. From this figure, it is evident that DBMS allows user to input data, share the data, edit the data, manipulate the data, and display the data in the database. Because a DBMS allows more than one user to share the data; the complexity extends to its design and implementation. 1.4.1 Structure of DBMS An overview of the structure of database management system is shown in Fig. 1.3. A DBMS is a software package, which translates data from its logical representation to its physical representation and back. The DBMS uses an application specific database description to define this translation. The database description is generated by a database designer DBMS UPDATE INPUT MANIPULATE SELECT DISPLAY SHARE EDIT Fig. 1.2. Capabilities of database management system 4 1 Overview of Database Management System Data Definition Language or Interface Database Management System Database Description Conceptual Schema User’s View of Database Database Fig. 1.3. Structure of database management system from his or her conceptual view of the database, which is called the Conceptual Schema. The translation from the conceptual schema to the database description is performed using a data definition language (DDL) or a graphical or textual design interface. 1.5 Objectives of DBMS The main objectives of database management system are data availability, data integrity, data security, and data independence. 1.5.1 Data Availability Data availability refers to the fact that the data are made available to wide variety of users in a meaningful format at reasonable cost so that the users can easily access the data. 1.5.2 Data Integrity Data integrity refers to the correctness of the data in the database. In other words, the data available in the database is a reliable data. 1.5.3 Data Security Data security refers to the fact that only authorized users can access the data. Data security can be enforced by passwords. If two separate users are accessing a particular data at the same time, the DBMS must not allow them to make conflicting changes. 1.6 Evolution of Database Management Systems 5 1.5.4 Data Independence DBMS allows the user to store, update, and retrieve data in an efficient manner. DBMS provides an “abstract view” of how the data is stored in the database. In order to store the information efficiently, complex data structures are used to represent the data. The system hides certain details of how the data are stored and maintained. 1.6 Evolution of Database Management Systems File-based system was the predecessor to the database management system. Apollo moon-landing process was started in the year 1960. At that time, there was no system available to handle and manage large amount of information. As a result, North American Aviation which is now popularly known as Rockwell International developed software known as Generalized Update Access Method (GUAM). In the mid-1960s, IBM joined North American Aviation to develop GUAM into Information Management System (IMS). IMS was based on Hierarchical data model. In the mid-1960s, General Electric released Integrated Data Store (IDS). IDS were based on network data model. Charles Bachmann was mainly responsible for the development of IDS. The network database was developed to fulfill the need to represent more complex data relationships than could be modeled with hierarchical structures. Conference on Data System Languages formed Data Base Task Group (DBTG) in 1967. DBTG specified three distinct languages for standardization. They are Data Definition Language (DDL), which would enable Database Administrator to define the schema, a subschema DDL, which would allow the application programs to define the parts of the database and Data Manipulation Language (DML) to manipulate the data. The network and hierarchical data models developed during that time had the drawbacks of minimal data independence, minimal theoretical foundation, and complex data access. To overcome these drawbacks, in 1970, Codd of IBM published a paper titled “A Relational Model of Data for Large Shared Data Banks” in Communications of the ACM, vol. 13, No. 6, pp. 377–387, June 1970. As an impact of Codd’s paper, System R project was developed during the late 1970 by IBM San Jose Research Laboratory in California. The project was developed to prove that relational data model was implementable. The outcome of System R project was the development of Structured Query Language (SQL) which is the standard language for relational database management system. In 1980s IBM released two commercial relational database management systems known as DB2 and SQL/DS and Oracle Corporation released Oracle. In 1979, Codd himself attempted to address some of the failings in his original work with an extended version of the relational model called RM/T in 1979 and RM/V2 in 1990. The attempts to provide a data model 6 1 Overview of Database Management System that represents the “real world” more closely have been loosely classified as Semantic Data Modeling. In recent years, two approaches to DBMS are more popular, which are Object-Oriented DBMS (OODBMS) and Object Relational DBMS (ORDBMS). The chronological order of the development of DBMS is as follows: – Flat files – 1960s–1980s – Hierarchical – 1970s–1990s – Network – 1970s–1990s – Relational – 1980s–present – Object-oriented – 1990s–present – Object-relational – 1990s–present – Data warehousing – 1980s–present – Web-enabled – 1990s–present Early 1960s. Charles Bachman at GE created the first general purpose DBMS Integrated Data Store. It created the basis for the network model which was standardized by CODASYL (Conference on Data System Language). Late 1960s. IBM developed the Information Management System (IMS). IMS used an alternate model, called the Hierarchical Data Model. 1970. Edgar Codd, from IBM created the Relational Data Model. In 1981 Codd received the Turing Award for his contributions to database theory. Codd Passed away in April 2003. 1976. Peter Chen presented Entity-Relationship model, which is widely used in database design. 1980. SQL developed by IBM, became the standard query language for databases. SQL was standardized by ISO. 1980s and 1990s. IBM, Oracle, Informix and others developed powerful DBMS. 1.7 Classification of Database Management System The database management system can be broadly classified into (1) Passive Database Management System and (2) Active Database Management System: 1. Passive Database Management System. Passive Database Management Systems are program-driven. In passive database management system the users query the current state of database and retrieve the information currently available in the database. Traditional DBMS are passive in the sense that they are explicitly and synchronously invoked by user or application program initiated operations. Applications send requests for operations to be performed by the DBMS and wait for the DBMS to confirm and return any possible answers. The operations can be definitions and updates of the schema, as well as queries and updates of the data. 1.8 File-Based System 7 2. Active Database Management System. Active Database Management Systems are data-driven or event-driven systems. In active database management system, the users specify to the DBMS the information they need. If the information of interest is currently available, the DBMS actively monitors the arrival of the desired information and provides it to the relevant users. The scope of a query in a passive DBMS is limited to the past and present data, whereas the scope of a query in an active DBMS additionally includes future data. An active DBMS reverses the control flow between applications and the DBMS instead of only applications calling the DBMS, the DBMS may also call applications in an active DBMS. Active databases contain a set of active rules that consider events that represent database state changes, look for TRUE or FALSE conditions as the result of a database predicate or query, and take an action via a data manipulation program embedded in the system. Alert is extension architecture at the IBM Almaden Research, for experimentation with active databases. 1.8 File-Based System Prior to DBMS, file system provided by OS was used to store information. In a file-based system, we have collection of application programs that perform services for the end users. Each program defines and manages its own data. Consider University database, the University database contains details about student, faculty, lists of courses offered, and duration of course, etc. In File-based processing for each database there is separate application program which is shown in Fig. 1.4. Group n of users Group 2 of users Group 1 of users Application 1 Files of Application 1 Application 2 Application n Files of Application 2 Files of Application n Fig. 1.4. File-based System 8 1 Overview of Database Management System One group of users may be interested in knowing the courses offered by the university. One group of users may be interested in knowing the faculty information. The information is stored in separate files and separate applications programs are written. 1.9 Drawbacks of File-Based System The limitations of file-based approach are duplication of data, data dependence, incompatible file formats, separation, and isolation of data. 1.9.1 Duplication of Data Duplication of data means same data being stored more than once. This can also be termed as data redundancy. Data redundancy is a problem in filebased approach due to the decentralized approach. The main drawbacks of duplication of data are: – Duplication of data leads to wastage of storage space. If the storage space is wasted it will have a direct impact on cost. The cost will increase. – Duplication of data can lead to loss of data integrity; the data are no longer consistent. Assume that the employee detail is stored both in the department and in the main office. Now the employee changes his contact address. The changed address is stored in the department alone and not in the main office. If some important information has to be sent to his contact address from the main office then that information will be lost. This is due to the lack of decentralized approach. 1.9.2 Data Dependence Data dependence means the application program depends on the data. If some modifications have to be made in the data, then the application program has to be rewritten. If the application program is independent of the storage structure of the data, then it is termed as data independence. Data independence is generally preferred as it is more flexible. But in file-based system there is program-data dependence. 1.9.3 Incompatible File Formats As file-based system lacks program data independence, the structure of the file depends on the application programming language. For example, the structure of the file generated by FORTRAN program may be different from the structure of a file generated by “C” program. The incompatibility of such files makes them difficult to process jointly. 1.10 DBMS Approach 9 1.9.4 Separation and Isolation of Data In file-based approach, data are isolated in separate files. Hence it is difficult to access data. The application programmer must synchronize the processing of two files to ensure that the correct data are extracted. This difficulty is more if data has to be retrieved from more than two files. The draw backs of conventional file-based approach are summarized later: 1. We have to store the information in a secondary memory such as a disk. If the volume of information is large; it will occupy more memory space. 2. We have to depend on the addressing facilities of the system. If the database is very large, then it is difficult to address the whole set of records. 3. For each query, for example the address of the student and the list of electives that the student has chosen, we have to write separate programs. 4. While writing several programs, lot of variables will be declared and it will occupy some space. 5. It is difficult to ensure the integrity and consistency of the data when more than one program accesses some file and changes the data. 6. In case of a system crash, it becomes hard to bring back the data to a consistent state. 7. “Data redundancy” occurs when identical data are distributed over various files. 8. Data distributed in various files may be in different formats hence it is difficult to share data among different application (Data Isolation). 1.10 DBMS Approach DBMS is software that provides a set of primitives for defining, accessing, and manipulating data. In DBMS approach, the same data are being shared by different application programs; as a result data redundancy is minimized. The DBMS approach of data access is shown in Fig. 1.5. Group n of users Group 2 of users Group 1 of users Application 1 Application 2 Application n DBMS DB raw data + data Fig. 1.5. Data access through DBMS

Comments

Popular posts from this blog

1 Centralized Data Management

Entity–Relationship Model

Structured Query Language