An Efficient Tree-Based Algorithm for Mining High Average-Utility Itemset

YILDIRIM, İRFAN; ÇELİK, METE

doi:10.1109/access.2019.2945840

An Efficient Tree-Based Algorithm for Mining High Average-Utility Itemset

YILDIRIM İ., ÇELİK M.

IEEE ACCESS, cilt.7, ss.144245-144263, 2019 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 7
Basım Tarihi: 2019
Doi Numarası: 10.1109/access.2019.2945840
Dergi Adı: IEEE ACCESS
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus
Sayfa Sayıları: ss.144245-144263
Anahtar Kelimeler: Average utility, high average utility itemset, tighter upper bounds, utility mining, pruning strategy
Erciyes Üniversitesi Adresli: Evet

Özet

High-utility itemset mining (HUIM), which is an extension of well-known frequent itemset mining (FIM), has become a key topic in recent years. HUIM aims to find a complete set of itemsets having high utilities in a given dataset. High average-utility itemset mining (HAUIM) is a variation of traditional HUIM. HAUIM provides an alternative measurement named the average-utility to discover the itemsets by taking into consideration both of the utility values and lengths of itemsets. HAUIM is important for several application domains, such as, business applications, medical data analysis, mobile commerce, streaming data analysis, etc. In the literature, several algorithms have been proposed by introducing their own upper-bound models and data structures to discover high average utility itemsets (HAUIs) in a given database. However, they require long execution times and large memory consumption to handle the problem. To overcome these limitations, this paper, first, introduces four novel upper-bounds along with pruning strategies and two data structures. Then, it proposes a pattern growth approach called the HAUL-Growth algorithm for efficiently mining of HAUIs using the proposed upper-bounds and data structures. Experimental results show that the proposed HAUL-Growth algorithm significantly outperforms the state-of-the-art dHAUIM and TUB-HAUIM algorithms in terms of execution times, number of join operations, memory consumption, and scalability.