BloomCast: Efficient and Effective
Full-Text Retrieval in Unstructured P2P Networks
ABSTRACT:
Efficient and effective full-text
retrieval in unstructured peer-to-peer networks remains a challenge in the
research community. First, it is difficult, if not impossible, for unstructured
P2P systems to effectively locate items with guaranteed recall. Second,
existing schemes to improve search success rate often rely on replicating a
large number of item replicas across the wide area network, incurring a large amount
of communication and storage costs. In this paper, we propose BloomCast, an
efficient and effective full-text retrieval scheme, in unstructured P2P
networks. By leveraging a hybrid P2P protocol, BloomCast replicates the items uniformly
at random across the P2P networks, achieving a guaranteed recall at a
communication cost of Þ, where N is the
size of the network. Furthermore, by casting Bloom Filters instead of the raw
documents across the network, BloomCast significantly reduces the communication
and storage costs for replication. We demonstrate the power of BloomCast design
through both mathematical proof and comprehensive simulations based on the
query logs from a major commercial search engine and NIST TREC WT10G data collection.
Results show that BloomCast achieves an average query recall of 91 percent,
which outperforms the existing WP algorithm by 18 percent, while BloomCast
greatly reduces the search latency for query processing by 57 percent.
SYSTEM
ARCHITECTURE:
EXISTING
SYSTEM:
In the existing system there are two
major issues . First, it is difficult, if not impossible, for unstructured P2P
systems to effectively locate items with guaranteed recall. Second, existing
schemes to improve search success rate often rely on replicating a large number
of item replicas across the wide area network, incurring a large amount of
communication and storage costs.
An existing p2p search schemes:
DHT-based global index and federated search engine over unstructured protocols.
DHT-based search engines are based on
distributed indexes that partition a logically global inverted index in a
physically distributed manner.
Federated search engine over
unstructured p2ps, queries are processed based on flooding. Unstructured p2ps are commonly believed to be
the best candidate for supporting full-text retrieval because the query
evaluation operations an be handled at the nodes that store the relevant
documents.
Replication strategies are extensively
utilized to improve search performance in unstructured p2ps. The first type is the query popularity aware
strategies.
The second type of replication strategy
is independent of the popularity of the query, such as the WP scheme.
DISADVATAGES
OF EXISTING SYSTEM:
– The
exact match problem of DHTs, such schemes provide poor full-text search
capacity.
– Search
recall is not guaranteed with acceptable communication cost using a flooding-based
scheme.
– The
strategy is inefficient for solving insoluble queries, the queries for rate
items. The query frequency is difficult
or even impossible to obtain in a distributed p2p system. The existing replication strategies need to
replicate the full document across the network, raising possibly unacceptable
communication and storage costs.
PROPOSED
SYSTEM:
In the proposed
system, we propose
a novel strategy, called BloomCast , an efficient and effective full-text
retrieval scheme, in unstructured P2P networks.
The query popularity independent
replication strategy, we propose a novel strategy, called Bloom Cast, to
support efficient and effective full-text retrieval.
Bloom Cast are mathematically that the
recall can be guaranteed at a communication cost of O (square root N), where N
is the size of the network.
ADVANTAGES
OF PROPOSED SYSTEM:
– By
replicating the encoded term sets using Bloom Filters instead of raw documents
among peers, the communication/storage costs are greatly reduced, while the
full-text multi keyword searching are supported.
MODULES:
• Node
creation
• Bloom
cast replication model generation
• Bloom
cast
• Bloom
filter
• Query
recall
MODULE
DESCRIPTION
• Node creation
• To
retrieve the full-text efficiently we have created nodes in the p2p networks.
• Each
node is sending documents randomly and uniformly in the unstructured p2p
network.
• By
creating nodes in unstructured p2p networks it reduces the communication and
storage cost.
• Bloom cast replication model
generation
• Replication
model is generated based on the document replica and query replica.
• Bloom
cast replica is estimated by the number of nodes having replica of document and
query.
• By
using this replication count we evaluate the search success rate of query
searched by the user.
• Bloom cast
• Bloom
cast is generated based on network size estimation, node subset sampling,
replication protocol, query evaluation.
• Network
size is estimated by DHT subsystem which maintains the local repository of
replicas.
• After
that we assign the sub nodes to reduce the cost and storage.
• Query
evaluation is estimated by optimum number of query replication randomly
distributed in network.
• Bloom filter
• Bloom
filter maintains the hash table for document replica and query replica.
• Bloom
filter reduces the memory storage and search engines efficient and effectively
for full-text retrieval.
• Query recall
• The
recall will produce the replica and Bloom filter without any loss.
• Query
recall will retrieve full-text in unstructured p2p network and reduces
communication cost and storage cost.
• It
retrieves the data quickly and satisfies the user requirement.
SYSTEM CONFIGURATION:-
HARDWARE REQUIREMENTS:-
ü Processor -Pentium –III
ü Speed - 1.1 Ghz
ü RAM - 256 MB(min)
ü Hard
Disk - 20 GB
ü Floppy
Drive - 1.44 MB
ü Key
Board - Standard Windows Keyboard
ü Mouse - Two or Three Button Mouse
ü Monitor - SVGA
SOFTWARE REQUIREMENTS:-
v Operating System : Windows95/98/2000/XP
v Front End : Java
REFERENCE:
Hanhua Chen, Member, Xucheng Luo, Yunhao
Liu, Tao Gu, Kaiji Chen, and Lionel M. Ni, “ BloomCast: Efficient and Effective
Full-Text Retrieval in Unstructured P2P Networks”, IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 23, NO. 2,
FEBRUARY 2012.