Organizing User Search
Histories
ABSTRACT:
Users are increasingly pursuing complex
task-oriented goals on the web, such as making travel arrangements, managing finances,
or planning purchases. To this end, they usually break down the tasks into a
few codependent steps and issue multiple queries around these steps repeatedly
over long periods of time. To better support users in their long-term
information quests on the web, search engines keep track of their queries and
clicks while searching online. In this paper, we study the problem of
organizing a user’s historical queries into groups in a dynamic and automated
fashion. Automatically identifying query groups is helpful for a number of
different search engine components and applications, such as query suggestions,
result ranking, query alterations, sessionization, and collaborative search. In
our approach, we go beyond approaches that rely on textual similarity or time
thresholds, and we propose a more robust approach that leverages search query
logs. We experimentally study the performance of different techniques, and showcase
their potential, especially when combined together.
EXISTING
SYSTEM:
However, existing system is impractical
in our scenario for two reasons. First, it may have the undesirable effect of
changing a user’s existing query groups, potentially undoing the user’s own
manual efforts in organizing her history. Second, it involves a high
computational cost, since we would have to repeat a large number of query group
similarity computations for every new query.
DISADVANTAGES
OF EXISTING SYSTEM:
1.
We motivate and propose a method to
perform query grouping in a dynamic fashion. Our goal is to ensure good performance
while avoiding disruption of existing user-defined query groups.
PROPOSED
SYSTEM:
1.
We investigate how signals from search
logs such as query reformulations and clicks can be used together to determine
the relevance among query groups. We study two potential ways of using clicks
in order to enhance this process by fusing the query reformulation graph and
the query click graph into a single graph that we refer to as the query
fusion graph, and by expanding the query set when computing relevance to
also include other queries with similar clicked URLs.
2.
We show through comprehensive experimental
evaluation the effectiveness and the robustness of our proposed search
log-based method, especially when combined with approaches using other signals
such as text similarity.
ADVANTAGES
OF PROPOSED SYSTEM:
1.
We will focus on evaluating the effectiveness
of the proposed algorithms in capturing query relevance.
2.
Relevance Measure
3.
Online query grouping process
4.
Similarity function
MODULES:
Query Group
Search history
Query Relevance and Search logs
Dynamic
Query Grouping
MODULE
DESCRIPTION:
Query
Group:
We need a relevance measure that is
robust enough to identify similar query groups beyond the approaches that
simply rely on the textual content of queries or time interval between them.
Our approach makes use of search logs in order to determine the relevance
between query groups more effectively. In fact, the search history of a large
number of users contains signals regarding query relevance, such as which
queries tend to be issued closely together (query reformulations), and which queries
tend to lead to clicks on similar URLs (query clicks). Such signals are
user-generated and are likely to be more robust, especially when considered at
scale. We suggest measuring the relevance between query groups by exploiting
the query logs and the click logs simultaneously.
Search
History:
We study the problem of organizing a
user’s search history into a set of query groups in an automated and
dynamic fashion. Each query group is a collection of queries by the same user
that are relevant to each other around a common informational need. These query
groups are dynamically updated as the user issues new queries, and new query
groups may be created over time.
Query
Relevance and Search logs:
We now develop the machinery to define
the query relevance based on Web search logs. Our measure of relevance
is aimed at capturing two important properties of relevant queries, namely: (1)
queries that frequently appear together as reformulations and (2) queries that
have induced the users to click on similar sets of pages. We start our
discussion by introducing three search behavior graphs that capture the
aforementioned properties. Following that, we show how we can use these graphs
to compute query relevance and how we can incorporate the clicks following a
user’s query in order to enhance our relevance metric.
Dynamic Query Grouping:
One approach to the identification of
query groups is to first treat every query in a user’s history as a singleton
query group, and then merge these singleton query groups in an iterative
fashion (in a k-means or agglomerative way. However, this is impractical in our
scenario for two reasons. First, existing query groups, potentially doing the
user’s own manual efforts in organizing her history. Second, it involves a high
computational cost, since we would have to repeat a large number of query group
similarity computations for every new query.
SYSTEM
REQUIREMENTS:
HARDWARE
REQUIREMENTS:
•
System : Pentium IV 2.4 GHz.
•
Hard
Disk : 40 GB.
•
Floppy
Drive : 1.44 Mb.
•
Monitor : 15 VGA Colour.
•
Mouse : Logitech.
•
Ram : 512 Mb.
SOFTWARE
REQUIREMENTS:
•
Operating system : - Windows XP.
•
Coding Language : ASP.NET, C#.Net.
•
Data Base : SQL Server 2005
REFERENCE:
Heasoo Hwang, Hady W. Lauw, Lise Getoor, and
Alexandros Ntoulas, “Organizing User Search Histories”, IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 24, NO. 5,
MAY 2012.