Data Warehouse


Data Warehouse?

Data warehouses are large integrated databases that connect relevant information from other sources into a single accessible format. They allow users to look at data from multiple operational systems across multiple dimensions, including time. These systems have the advantage that they can import and analyze data from a variety of other systems that cannot otherwise communicate with each other.

For example, an organization would use the information that's stored in its data warehouse to find out what day of the week they sold the most widgets in May 1992, or how employee sick leave the week before Christmas differed between California and Quebec from 2001-2005. In other words, the data warehouse contains the raw material for management's decision support system. The critical factor leading to the use of a data warehouse is that a data analyst can perform complex queries and analysis (such as data mining) on the information without slowing down the operational systems.


History of Data Warehousing

Data Warehouses became a distinct type of computer database during the late 1980s and early 1990s. They were developed to meet a growing demand for management information and analysis that could not be met by operational systems. Operational systems were unable to meet this need for a range of reasons such as The processing load of reporting reduced the response time of the operational systems and Development of reports in operational systems often required writing specific computer programs which was slow and expensive.

As a result, separate computer databases began to be built that were specifically designed to support management information and analysis purposes. These data warehouses were able to bring in data from a range of different data sources, such as mainframe computers, minicomputers, as well as personal computers and office automation software such as spreadsheet, and integrate this information in a single place. This capability, coupled with user-friendly reporting tools and freedom from operational impacts, has led to a growth of this type of computer system.

As technology improved (lower cost for more performance) and user requirements increased (faster data load cycle times and more features), data warehouses have evolved through several fundamental stages:

  • Offline Operational Databases - Data warehouses in this initial stage are developed by simply copying the database of an operational system to an off-line server where the processing load of reporting does not impact on the operational system's performance.
  • Offline Data Warehouse - Data warehouses in this stage of evolution are updated on a regular time cycle (usually daily, weekly or monthly) from the operational systems and the data is stored in an integrated reporting-oriented data structure
  • Real Time Data Warehouse - Data warehouses at this stage are updated on a transaction or event basis, every time an operational system performs a transaction (e.g. an order or a delivery or a booking etc.)
  • Integrated Data Warehouse - Data warehouses at this stage are used to generate activity or transactions that are passed back into the operational systems for use in the daily activity of the organization.


Data Warehouse architecture

The term data warehouse architecture is primarily used today to describe the overall structure of a Business Intelligence system. Other historical terms include decision support systems (DSS), management information systems (MIS), and others.

The data warehouse architecture describes the overall system from various perspectives such as data, process, and infrastructure needed to communicate the structure, function and interrelationships of each component. The infrastructure or technology perspective details the various hardware and software products used to implement the distinct components of the overall system. The data perspective typically diagrams the source and target data structures and aid the user in understanding what data assets are available and how they are related. The process perspective is primarily concerned with communicating the process and flow of data from the originating source system through the process of loading the data warehouse, and often the process that client products use to access and extract data from the warehouse.


Advantages of using Data Warehouse

There are many advantages to using a data warehouse, some of them are:

  • Enhances end-user access to a wide variety of data
  • Business decision makers can obtain various kinds of trend reports e.g. the item with the most sales in a particular area / country for the last two years
  • Increased data consistency
  • Additional documentation of the data
  • Potentially lower computing costs and increased productivity
  • Providing a place to combine related data from separate sources
  • Creation of a computing infrastructure that can support changes in computer systems and business structures
  • Empowering end-users to perform any level of ad-hoc queries or reports without impacting the performance of the operational systems

A data warehouse can be a significant enabler of commercial business applications, most notably customer relationship management (CRM).

 

For more about data warehousing:         Data warehousing     |     Data integration     |     Data mining

See our comprehensive range of other professional data cleansing software products at