Follow us on Facebook

Header Ads

D-Cache: Universal Distance Cache for Metric Access Methods

D-Cache: Universal Distance Cache for Metric Access Methods

ABSTRACT:

The caching of accessed disk pages has been successfully used for decades in database technology, resulting in effective amortization of I/O operations needed within a stream of query or update requests. However, in modern complex databases, like multimedia databases, the I/O cost becomes a minor performance factor. In particular, metric access methods (MAMs), used for similarity search in complex unstructured data, have been designed to minimize rather the number of distance computations than I/O cost (when indexing or querying). Inspired by I/O caching in traditional databases, in this paper we introduce the idea of distance caching for usage with MAMs—a novel approach to streamline similarity search. As a result, we present the D-cache, a main-memory data structure which can be easily implemented into any MAM, in order to spare the distance computations spent by queries/updates. In particular, we have modified two state-of-the-art MAMs to make use of D-cache—the M-tree and Pivot tables. Moreover, we present the D-file, an index-free MAM based on simple sequential search augmented by D-cache. The experimental evaluation shows that performance gain achieved due to D-cache is significant for all the MAMs, especially for the D-file.




EXISTING SYSTEM:

·        A moving kNN query continuously reports the k nearest neighbors of a moving query point.
·        Location-based service providers (LBS) that offer remote kNN querying services often return mobile users a safe region to the query results.
·        The main memory is always limited and the distance matrix could expand to an enormous size, a compact data structure that consumes a user-defined portion of main memory.
·        The basic task of D-cache is determine tight lower- and upper bound of an unknown distance between two objects, based on stored distances computed during previous querying and/or indexing.

PRPOSED SYSTEM:
·        The D-cache, a main-memory data structure which tracks computed distances while inserting objects or performing similarity queries in the metric space Model.
·        The D-cache aims to amortize the number of distance computations spent by querying/updating the database, similarly like disk page buffering in traditional DBMSs aims to amortize the I/O cost.
·        The D-cache structure is based on a hash table, thus making efficient to retrieve stored distances for further usage.
·        The experiments have shown the M-tree enhanced by NN-graphs can perform significantly faster, while keeping the construction cheap.

MODULES:

          Using 6 Models in our Project
·        Parent Filtering.
·        NN Graph Filtering.
·        Nearest Neigbour Graph.
·        Range Query.
·        M-Tree Querying Cost.
·        M-Tree Constructing Cost.

MODULES DESCRIPTION:

Parent Filtering:

The tree contains the distances from the routing/ground entries to the center of its parent user; some of the non-relevant M-tree branches can be filtered out without the need of a distance computation.
The tree of parent filtering is |_ (P,Q) − _(P,R)| > rQ + rR, the data ball R cannot overlap the query ball, thus the child user has not to be re-checked by basic filtering.
The Sequential file was simplify by the query is _ (P,Q) was computed in the previous (unsuccessful) parent’s basic filtering.

NN Graph Filtering:
The filtering using NN-graph is similar to the parent filtering, however, instead of the parent; we use an object S from the node.
The query selects such object S; then its distance to the query object Q is explicitly computed.
The object S a sacrifice pivot, since to rescue  other objects from basic filtering, this one must be sacrificed to the NN graph Filtering.

Nearest Neighbour Graph:

The Nearest Neighbour Process is same to M-tree Construction Process
The original M-tree proposal, the index was constructed by multiple dynamic insertions, which consisted of two steps.
First Step is, The leaf node for the newly inserted object is found by traversing a single path in the tree.
Second Step is, The leaf gets overfull after the insertion, it is split, such that two objects from the split leaf are selected as centers of the new two leafs, while the remaining objects within the split leaf are distributed among the new leafs.

Range Query:

The implementing a query processing, the tree structure is traversed such that non-overlapping users are excluded from further processing.
The basic and parent filtering, in M*-tree we can use the NN-graph filtering step (inserted after the step of parent filtering and before the basic filtering),while we hope some distance computations needed by basic filtering after an unsuccessful parent filtering will be saved.
The kNN algorithm is a bit more difficult, since the query radius rQ is not known at the beginning of kNN search of NN filtering user.
 The listing of kNN pseudo code, however, its form can be easily derived from the original M-tree’s kNN algorithm and the M*-tree’s range query implementation presented above.

M-Tree Querying Cost:

The range search is always more efficient in M*-tree than in M-tree, because only such entries are chosen as to filtered by the parent, so for them distance computation is unavoidable.
The NN-graph filtering some of the entries can be filtered before they become a sacrifice, thus distance computations are reduced in this case.

M-Tree Constructing Cost:

M*-tree, the navigation to the target leaf makes use of NN-graph filtering, so we achieve faster navigation.
The M-Tree insertion into the leaf itself the update of leaf’s-graph is needed, which takes m distance computations for M*-tree instead of no computation for M-tree.
The expensive splitting of a node does not require any additional distance computation, since all pair wise distances have to be computed to partition the node, regardless of using M-tree or M*-tree.

HARDWARE REQUIREMENTS

                     SYSTEM             : Pentium IV 2.4 GHz
                     HARD DISK        : 40 GB
                     FLOPPY DRIVE  : 1.44 MB
                     MONITOR           : 15 VGA colour
                     MOUSE               : Logitech.
                     RAM                    : 256 MB
                     KEYBOARD       : 110 keys enhanced.

SOFTWARE REQUIREMENTS

                     Operating system           :-  Windows XP Professional
                     Front End             :-  Microsoft Visual Studio .Net 2008
                     Coding Language : - C# .NET.
                     Database              :- SQL Server 2005

REFERENCE:
Toma´ _s Skopal, Member, IEEE, Jakub Loko_c, and Benjamin Bustos, “D-Cache: Universal Distance Cache for Metric Access Methods”, IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 24, NO. 5, MAY 2012.