We study how to obtain concise descriptions of discrete multivariate sequential data in terms of rich multivariate sequential patterns that can capture potentially highly interesting (cor)relations between sequences. To this end we allow our pattern language to span over the alphabets (domains) of all sequences, allow patterns to overlap temporally, and allow for gaps in their occurrences. We formalise our goal by the Minimum Description Length principle, by which our objective is to discover the set of patterns that provides the most succinct description of the data. To discover good pattern sets, we introduce Ditto, an efficient algorithm to approximate the ideal result. We support our claim with a set of experiments on both synthetic and real data.
from cs.AI updates on arXiv.org http://ift.tt/1TgfF8q
via IFTTT
No comments:
Post a Comment