F-NSP<sup>+</sup>: A fast negative sequential patterns mining method with self-adaptive data storage

Publication Type:
Journal Article
Citation:
Pattern Recognition, 2018, 84 pp. 13 - 27
Issue Date:
2018-12-01
Filename Description Size
1-s2.0-S0031320318302310-main.pdfPublished Version998.59 kB
Adobe PDF
Full metadata record
© 2018 Elsevier Ltd Mining negative sequential patterns (NSP) is an important tool for nonoccurring behavior analysis, and it is much more challenging than mining positive sequential patterns (PSPs) due to the high computational complexity and huge search space when obtaining the support of negative sequential candidates (NSCs). Very few NSP mining algorithms are available and most of them are very inefficient since they obtain the support of NSC by scanning the database repeatedly. Instead, the state-of-the-art NSP mining algorithm e-NSP only uses the PSP's information stored in an array structure to ‘calculate' the support of NSC by equations, without database re-scanning. This makes e-NSP highly efficient, particularly on sparse datasets. However, when datasets become dense, the key process to obtain the support of NSC in e-NSP becomes very time-consuming and needs to be improved. In this paper, we propose a novel and efficient data structure, a bitmap, to obtain the support of NSC. We correspondingly propose a fast NSP mining algorithm, f-NSP, which uses a bitmap to store the PSP's information and then obtain the support of NSC only by bitwise operations, which is much faster than the hash method in e-NSP. Experimental results on real-world and synthetic datasets show that f-NSP is not only tens to hundreds of times faster than e-NSP, but also saves more than ten-fold the storage spaces of e-NSP, particularly on dense datasets with a large number of elements in a sequence or a small number of itemsets. Further, we find that f-NSP consumes more storage space than e-NSP when PSP's support is less than a support threshold sdsup, a value obtained through our theoretical analysis of storage space. Accordingly, we propose a self-adaptive storage strategy and a corresponding algorithm f-NSP+ to overcome this deficiency. f-NSP+ can automatically choose a bitmap or an array structure to store PSP information according to PSP support. Experimental results show that f-NSP+ saves more storage spaces of f-NSP, and has similar time efficiency as f-NSP.
Please use this identifier to cite or link to this item: