An MDL-based Frequent Itemset Hierarchical Clustering Technique to Improve Query Search Results of An Individual Search Engine

D. Puspitaningrum, Fauzi [No Value], B. Susilo, J. A. Pagua, A. Erlansari, D. Andreswari, R. Efendi, I.S.W.B. Prasetya

    Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

    Abstract

    In this research we propose a technique of frequent itemset hierarchical clustering (FIHC) using an MDL-based algorithm, viz KRIMP. Different from the FIHC technique, in this proposed method we define clustering as a rank sequence problem of the top-3 ranked list of each itemsets-of-keywords clusters in web documents search results of a given query to a search engine. The key idea of an MDL compression based approach is the code table. Only frequent and representative keywords as those in a KRIMP code table can be used as candidates, instead of using all important keywords from keywords extractor such as RAKE. To simulate information needs in the real world, the web documents are originated from the search results of a multi domain query. By starting in a meta-search engine environment to grab many relevant documents, we set up k=50,100,200 for k-toplist retrieved documents of each search engine to build a dataset for automatic relevance judgement. We implement a clustering technique to the best individual search engine the MDL-based FIHC algorithm with setting of k=50,100,200 for k-toplist of retrieved documents of each search engine, minimum support=5 for itemset KRIMP compression, and minimum cluster support=0.1 for FIHC clustering. Our results show that the MDL-based FIHC clustering can improve the relevance scores of web search results on an individual search engine significantly (until 39.2 % at precision P10, k-toplist=50).
    Original languageEnglish
    Title of host publicationAsia Information Retrieval Societies Conference (AIRS)
    PublisherSpringer
    Publication statusPublished - 2015

    Fingerprint

    Dive into the research topics of 'An MDL-based Frequent Itemset Hierarchical Clustering Technique to Improve Query Search Results of An Individual Search Engine'. Together they form a unique fingerprint.

    Cite this