An Efficient Tree-Based Algorithm for Mining High Average-Utility Itemset


YILDIRIM İ., ÇELİK M.

IEEE ACCESS, cilt.7, ss.144245-144263, 2019 (SCI-Expanded) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 7
  • Basım Tarihi: 2019
  • Doi Numarası: 10.1109/access.2019.2945840
  • Dergi Adı: IEEE ACCESS
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus
  • Sayfa Sayıları: ss.144245-144263
  • Anahtar Kelimeler: Average utility, high average utility itemset, tighter upper bounds, utility mining, pruning strategy
  • Erciyes Üniversitesi Adresli: Evet

Özet

High-utility itemset mining (HUIM), which is an extension of well-known frequent itemset mining (FIM), has become a key topic in recent years. HUIM aims to find a complete set of itemsets having high utilities in a given dataset. High average-utility itemset mining (HAUIM) is a variation of traditional HUIM. HAUIM provides an alternative measurement named the average-utility to discover the itemsets by taking into consideration both of the utility values and lengths of itemsets. HAUIM is important for several application domains, such as, business applications, medical data analysis, mobile commerce, streaming data analysis, etc. In the literature, several algorithms have been proposed by introducing their own upper-bound models and data structures to discover high average utility itemsets (HAUIs) in a given database. However, they require long execution times and large memory consumption to handle the problem. To overcome these limitations, this paper, first, introduces four novel upper-bounds along with pruning strategies and two data structures. Then, it proposes a pattern growth approach called the HAUL-Growth algorithm for efficiently mining of HAUIs using the proposed upper-bounds and data structures. Experimental results show that the proposed HAUL-Growth algorithm significantly outperforms the state-of-the-art dHAUIM and TUB-HAUIM algorithms in terms of execution times, number of join operations, memory consumption, and scalability.