D-Cache: Universal
Distance Cache for Metric Access Methods
ABSTRACT:
The caching of accessed disk pages has
been successfully used for decades in database technology, resulting in
effective amortization of I/O operations needed within a stream of query or
update requests. However, in modern complex databases, like multimedia
databases, the I/O cost becomes a minor performance factor. In particular,
metric access methods (MAMs), used for similarity search in complex
unstructured data, have been designed to minimize rather the number of distance
computations than I/O cost (when indexing or querying). Inspired by I/O caching
in traditional databases, in this paper we introduce the idea of distance caching
for usage with MAMs—a novel approach to streamline similarity search. As a
result, we present the D-cache, a main-memory data structure which can be
easily implemented into any MAM, in order to spare the distance computations
spent by queries/updates. In particular, we have modified two state-of-the-art
MAMs to make use of D-cache—the M-tree and Pivot tables. Moreover, we present the
D-file, an index-free MAM based on simple sequential search augmented by
D-cache. The experimental evaluation shows that performance gain achieved due
to D-cache is significant for all the MAMs, especially for the D-file.
EXISTING
SYSTEM:
·
A moving kNN query continuously reports
the k nearest neighbors of a moving query point.
·
Location-based service providers (LBS)
that offer remote kNN querying services often return mobile users a safe region
to the query results.
·
The main memory is always limited and
the distance matrix could expand to an enormous size, a compact data structure
that consumes a user-defined portion of main memory.
·
The basic task of D-cache is determine
tight lower- and upper bound of an unknown distance between two objects, based
on stored distances computed during previous querying and/or indexing.
PRPOSED
SYSTEM:
·
The D-cache, a main-memory data
structure which tracks computed distances while inserting objects or performing
similarity queries in the metric space Model.
·
The D-cache aims to amortize the number
of distance computations spent by querying/updating the database, similarly
like disk page buffering in traditional DBMSs aims to amortize the I/O cost.
·
The D-cache structure is based on a hash
table, thus making efficient to retrieve stored distances for further usage.
·
The experiments have shown the M-tree
enhanced by NN-graphs can perform significantly faster, while keeping the
construction cheap.
MODULES:
Using
6 Models in our Project
·
Parent Filtering.
·
NN Graph Filtering.
·
Nearest Neigbour Graph.
·
Range Query.
·
M-Tree Querying Cost.
·
M-Tree Constructing Cost.
MODULES DESCRIPTION:
Parent Filtering:
The
tree contains the distances from the routing/ground entries to the center of
its parent user; some of the non-relevant M-tree branches can be filtered out
without the need of a distance computation.
The
tree of parent filtering is |_ (P,Q) − _(P,R)| > rQ + rR, the data ball R
cannot overlap the query ball, thus the child user has not to be re-checked by
basic filtering.
The
Sequential file was simplify by the query is _ (P,Q) was computed in the
previous (unsuccessful) parent’s basic filtering.
NN Graph Filtering:
The
filtering using NN-graph is similar to the parent filtering, however, instead
of the parent; we use an object S from the node.
The
query selects such object S; then its distance to the query object Q is
explicitly computed.
The
object S a sacrifice pivot, since to rescue
other objects from basic filtering, this one must be sacrificed to the
NN graph Filtering.
Nearest
Neighbour Graph:
The Nearest Neighbour Process is
same to M-tree Construction Process
The
original M-tree proposal, the index was constructed by multiple dynamic
insertions, which consisted of two steps.
First
Step is, The leaf
node for the newly inserted object is found by traversing a single path in the
tree.
Second
Step is, The leaf gets overfull after the
insertion, it is split, such that two objects from the split leaf are selected
as centers of the new two leafs, while the remaining objects within the split
leaf are distributed among the new leafs.
Range
Query:
The
implementing a query processing, the tree structure is traversed such that
non-overlapping users are excluded from further processing.
The
basic and parent filtering, in M*-tree we can use the NN-graph filtering step
(inserted after the step of parent filtering and before the basic
filtering),while we hope some distance computations needed by basic filtering
after an unsuccessful parent filtering will be saved.
The
kNN algorithm is a bit more difficult, since the query radius rQ is not known
at the beginning of kNN search of NN filtering user.
The listing of kNN pseudo code, however, its
form can be easily derived from the original M-tree’s kNN algorithm and the
M*-tree’s range query implementation presented above.
M-Tree
Querying Cost:
The
range search is always more efficient in M*-tree than in M-tree, because only
such entries are chosen as to filtered by the parent, so for them distance
computation is unavoidable.
The
NN-graph filtering some of the entries can be filtered before they become a
sacrifice, thus distance computations are reduced in this case.
M-Tree
Constructing Cost:
M*-tree,
the navigation to the target leaf makes use of NN-graph filtering, so we
achieve faster navigation.
The
M-Tree insertion into the leaf itself the update of leaf’s-graph is needed,
which takes m distance computations for M*-tree instead of no computation for
M-tree.
The
expensive splitting of a node does not require any additional distance
computation, since all pair wise distances have to be computed to partition the
node, regardless of using M-tree or M*-tree.
HARDWARE
REQUIREMENTS
•
SYSTEM : Pentium IV 2.4 GHz
•
HARD
DISK : 40 GB
•
FLOPPY
DRIVE : 1.44 MB
•
MONITOR : 15 VGA colour
•
MOUSE : Logitech.
•
RAM : 256 MB
•
KEYBOARD :
110 keys enhanced.
SOFTWARE
REQUIREMENTS
•
Operating system :- Windows XP
Professional
•
Front End :- Microsoft Visual Studio .Net 2008
•
Coding Language : - C# .NET.
•
Database :-
SQL Server 2005
REFERENCE:
Toma´ _s Skopal, Member, IEEE, Jakub
Loko_c, and Benjamin Bustos, “D-Cache: Universal Distance Cache for Metric
Access Methods”, IEEE TRANSACTIONS ON
KNOWLEDGE AND DATA ENGINEERING, VOL. 24, NO. 5, MAY 2012.