Follow us on Facebook

Header Ads

Measuring the Sky: On Computing Data Cubes via Skylining the Measures

Measuring the Sky: On Computing Data Cubes via Skylining the Measures

ABSTRACT:
Data cube is a key element in supporting fast OLAP. Traditionally, an aggregate function is used to compute the values in data cubes. In this paper, we extend the notion of data cubes with a new perspective. Instead of using an aggregate function, we propose to build data cubes using the skyline operation as the “aggregate function.” Data cubes built in this way are called “group-by skyline cubes” and can support a variety of analytical tasks. Nevertheless, there are several challenges in implementing group-by skyline cubes in data warehouses: 1) the skyline operation is computational intensive, 2) the skyline operation is holistic, and 3) a group-by skyline cube contains both grouping and skyline dimensions, rendering it infeasible to pre-compute all cuboids in advance. This paper gives details on how to store, materialize, and query such cubes.

EXISTING SYSTEM:
In the data-warehousing environment, OLAP tools have been extensively used for a wide range of decision support applications such as sales analysis, customer analysis, marketing, and services planning. These OLAP tools are built upon a multidimensional data model, in which data tuples are partitioned into different cells based on the values of their dimension attributes.


·        A moving kNN query continuously reports the k nearest neighbors of a moving query point.
·        Location-based service providers (LBS) that offer remote kNN querying services often return mobile users a safe region to the query results.
·        The group-by aggregate result with respect to a particular set of attributes called dimensions.
·        In data cube with non-holistic aggregate functions, cuboids can be organized as a lattice.
·        The pre-computation of all cuboids and is impractical because the number of cuboids is exponential to the number of dimensions.

DISADVANTAGTES OF EXISTING SYSTEM:
In traditional data cubes, an aggregate function takes as input a set of measure values and returns a single numeric value.

PROPOSED SYSTEM:
The Distance find out of a main-memory data structure which tracks computed distances while inserting objects or performing similarity queries in the metric space Model.

The number of distance computations spent by querying/updating the database, similarly like disk page buffering in traditional DBMSs aims to amortize the I/O cost.

The group-by skyline queries in data warehouses by proposing the concept of group-by skyline cube.

The B+ tree structure is based on a hash table, thus making efficient to retrieve stored distances for further usage.

In this paper, we extend the notion of data cube with a new perspective. Specifically, we study the issues of building data cubes that exploit the skyline operator as the post operation instead of the traditional aggregate functions. We name this type of data cubes as group-by skyline cubes.

To the best of our knowledge, the building of group-by skyline cube, or the implementation of the skyline operation as a postoperation in data warehouses, has not been addressed previously in the research literature or in commercial products. This paper studies this issue in detail. Our contributions can be summarized as follows:

 1. The concept of group-by skyline cube is presented. That includes the discussion of what a “group-by skyline cuboid” is, the relationships between different group-by skyline cuboids, and how these cuboids constitute a group-by skyline cube.

2. The technical details of supporting group-by skyline cube are presented. Specifically, we propose to materialize a group-by skyline cube as an extended group-by skyline cube (ES-cube). In an ES-cube, skyline results across cuboids are derivable from each other. We further develop construction and query processing algorithms for ES-cube.
ADVANTAGES OF PROPOSED SYSTEM:
·        The skyline operator has been well recognized as a very important decision-support operator.

·        Experimental results show that the proposal techniques significantly reduce the query costs in terms of both CPU time and I/O time.

MODULES:

  Query Level Computation
  Data Materialization
  Tree Constructing Cost.
  Group by Sky Line Data.
  Cost Estimation.

MODULES DESCRIPTION:

Query Level Computation

  The implementing a query processing, the tree structure is traversed such that non-overlapping users are excluded from further processing.
  The basic and parent filtering, in M*-tree we can use the cost estimate graph filtering some distance computations needed by basic filtering after an unsuccessful parent filtering will be saved.
The Sky Line Computation algorithm is a bit more difficult, since the query radius rQ is not known at the beginning of data search of parent based pruning

Data Materialization

  In group-by skyline query processing, it is often desirable to pre-compute/materialize the extended skyline cuboids in the ES-cube.
  The materialize a subset of ES-cuboids that can bring the maximum query processing improvement.
  The selection of a cuboids for materialization is based on a linear cost model and that the cost of evaluating a query using a cuboid.
  The default parameter setting as in the experimental section and we measure the wall clock time of answering 100 random valid group-by skyline queries from a set of materialized ES-cuboids.





Tree Constructing Cost.

  M*-tree, the navigation to the target leaf makes use of Parent based Pruning, so we achieve faster navigation.
  The M-Tree insertion into the leaf itself the update of leaf’s-graph is needed, which takes m distance computations for M*-tree instead of no computation for M-tree.
  The expensive splitting of a node does not require any additional distance computation, since all pair wise distances have to be computed to partition the node, regardless of using M-tree or M*-tree.

Group By Sky Line Data.

  The sharing of computation of group-by skyline cuboids, especially the sharing of computation across the set of cuboids separated by the distinct value.
  In materialize group-by skyline cuboids using an extended definition of skyline. We show that group-by skyline cubes materialized in this way can enable sharing across various cuboids.
  The execution time is spent on the I/O cost and skyline computation shares 55.7%of the time.
  The remaining 17.4% of time contributes to the grouping operation and the other overhead.”

Cost Estimation.

  In budget-based partial materialization approach for selecting group-by skyline cuboids to be materialized such that they yield the highest improvement in query cost.
  The skyline cardinality for a uniformly-distributed dataset, and the asymptotic cost of several skyline algorithms in the average-case and the worst-case.
  The log sampling technique for estimating the skyline cardinality and the skyline computation cost of the BNL and SFS algorithms, with respect to arbitrary data distribution.

HARDWARE REQUIREMENTS

                     SYSTEM             : Pentium IV 2.4 GHz
                     HARD DISK        : 40 GB
                     FLOPPY DRIVE  : 1.44 MB
                     MONITOR           : 15 VGA colour
                     MOUSE               : Logitech.
                     RAM                    : 256 MB
                     KEYBOARD       : 110 keys enhanced.

SOFTWARE REQUIREMENTS

                     Operating system           :-  Windows XP Professional
                     Front End             :-  Microsoft Visual Studio .Net 2008
                     Coding Language : - C# .NET.
                     Database              :- SQL Server 2005
REFERENCE:
Man Lung Yiu, Eric Lo, and Duncan Yung, “Measuring the Sky: On Computing Data Cubes via Skylining the Measures”, IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 24, NO. 3, MARCH 2012.