Big data behaviour modelling and visual analytics

Zhang, Jinson

Big data behaviour modelling and visual analytics

Zhang, Jinson

Permalink

Publication Type:: Thesis
Issue Date:: 2017

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download contents and abstractAdobe PDF (217.61 kB)

Adobe PDF

Download thesisAdobe PDF (5.57 MB)

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Zhang, Jinson
dc.date.accessioned	2017-09-06T04:45:25Z
dc.date.available	2017-09-06T04:45:25Z
dc.date.issued	2017
dc.identifier.uri	http://hdl.handle.net/10453/116426
dc.description	University of Technology Sydney. Faculty of Engineering and Information Technology.	en_AU
dc.description.abstract	Big Data is composed of text, images, video, audio, mobile or other forms of data collected from multiple datasets, and is rapidly growing in both size and complexity. This has created a huge volume of multidimensional data within a very short time period. Big Data is therefore too big, too complex and moves too fast for us to analyze using traditional methods. Big Data behaviour is considered as a set of concepts and categories that descripts Big Data’s acts towards others. The challenges facing Big Data analysis and visualization include: 1) how to classify Big Data across multiple datasets and different forms of data, 2) how to visualize structured and unstructured Big Data behaviour patterns for multidimensional data, 3) how to display Big Data behaviour patterns with very large volumes onto a normal-sized screen, 4) how to visualize Big Data behaviour patterns without the loss of information. Big Data visualization normally requires optimized solutions through using different visual techniques for integrating display and exploration. To illustrate the huge amount of multidimensional data within a standard-size screen, visualization needs to find an efficient classification method for multiple datasets across any form of data. The current data interactive exploration has normally optimized data for visualization by excluding some pieces of information, resulting in missing information. Big Data visualization also suffers from visual cluttering and data overcrowding problems, whilst dealing with huge amounts of multidimensional data. My approach includes two parts: Big Data behaviour modelling and Big Data visualization. I have firstly established the 5Ws dimensions for Big Data classification, based on data behaviour ontologies, that can be applied to multiple datasets and to any form of data. Each data incident contains these 5Ws dimensions, which are posed as a set of concepts and categories that descripts Big Data acts for; When did the data occur, Where did the data come from, What did the data contain, How was the data transferred, Why did the data occur, and Who received the data. Secondly, I have introduced Pair-Density algorithms to measure Big Data behaviour patterns, which enables comparison and analysis between any two dimensions of behaviours. Two non-dimensional axes in parallel coordinates have then been created by using Pair-Density to measure and compare visual patterns for Big Data visualization. Finally, Shrunk Attributes has been deployed into Pair-Density parallel coordinates. This not only narrows down Big Data patterns for better understanding, but also dramatically reduces data cluttering and overcrowding in Big Data visualization. Three different datasets with a combined total of more than 2.5 million data incidents have been implemented for measuring and visualizing different data patterns, including both numerical and non-numerical dimensions. The experimental results have shown that my new approach has significantly improved the accuracy of Big Data visualization, reduced data cluttering by more than 80% without the loss of information. The use of 5Ws dimensions and Pair-Density parallel coordinates therefore has large potential benefits and applications across both the business and research fields. This thesis contains the research approach and implementation results obtained by the author during his Ph.D period. The majority of methods and results have been published in Seventeen research papers in journals and conference proceeding by May 2016.	en_AU
dc.format	Thesis (PhD)
dc.language.iso	en_AU	en_AU
dc.relation	https://opus.lib.uts.edu.au/bitstream/10453/116426/2/02whole.pdf
dc.rights	The author owns the copyright in this thesis including all reproduction and reuse rights for the work. The work may not be altered without the permission of the copyright owner. Attribution is essential when quoting or paraphrasing from this thesis.
dc.rights	info:eu-repo/semantics/openAccess
dc.rights	au.edu.uts.lib/ppc
dc.subject	Big Data.	en
dc.subject	Big Data analysis and visualization.	en
dc.subject	Visual cluttering.	en
dc.subject	Pair-Density algorithms.	en
dc.subject	5Ws dimensions.	en
dc.title	Big data behaviour modelling and visual analytics	en_AU
dc.type	Thesis	en_AU
utslib.copyright.status	open_access

Abstract:

Big Data is composed of text, images, video, audio, mobile or other forms of data collected from multiple datasets, and is rapidly growing in both size and complexity. This has created a huge volume of multidimensional data within a very short time period. Big Data is therefore too big, too complex and moves too fast for us to analyze using traditional methods. Big Data behaviour is considered as a set of concepts and categories that descripts Big Data’s acts towards others. The challenges facing Big Data analysis and visualization include: 1) how to classify Big Data across multiple datasets and different forms of data, 2) how to visualize structured and unstructured Big Data behaviour patterns for multidimensional data, 3) how to display Big Data behaviour patterns with very large volumes onto a normal-sized screen, 4) how to visualize Big Data behaviour patterns without the loss of information. Big Data visualization normally requires optimized solutions through using different visual techniques for integrating display and exploration. To illustrate the huge amount of multidimensional data within a standard-size screen, visualization needs to find an efficient classification method for multiple datasets across any form of data. The current data interactive exploration has normally optimized data for visualization by excluding some pieces of information, resulting in missing information. Big Data visualization also suffers from visual cluttering and data overcrowding problems, whilst dealing with huge amounts of multidimensional data. My approach includes two parts: Big Data behaviour modelling and Big Data visualization. I have firstly established the 5Ws dimensions for Big Data classification, based on data behaviour ontologies, that can be applied to multiple datasets and to any form of data. Each data incident contains these 5Ws dimensions, which are posed as a set of concepts and categories that descripts Big Data acts for; When did the data occur, Where did the data come from, What did the data contain, How was the data transferred, Why did the data occur, and Who received the data. Secondly, I have introduced Pair-Density algorithms to measure Big Data behaviour patterns, which enables comparison and analysis between any two dimensions of behaviours. Two non-dimensional axes in parallel coordinates have then been created by using Pair-Density to measure and compare visual patterns for Big Data visualization. Finally, Shrunk Attributes has been deployed into Pair-Density parallel coordinates. This not only narrows down Big Data patterns for better understanding, but also dramatically reduces data cluttering and overcrowding in Big Data visualization. Three different datasets with a combined total of more than 2.5 million data incidents have been implemented for measuring and visualizing different data patterns, including both numerical and non-numerical dimensions. The experimental results have shown that my new approach has significantly improved the accuracy of Big Data visualization, reduced data cluttering by more than 80% without the loss of information. The use of 5Ws dimensions and Pair-Density parallel coordinates therefore has large potential benefits and applications across both the business and research fields. This thesis contains the research approach and implementation results obtained by the author during his Ph.D period. The majority of methods and results have been published in Seventeen research papers in journals and conference proceeding by May 2016.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/116426