Models for the Generation of Heterogeneous Complex Networks

TR Number
Date
2015-07-02
Journal Title
Journal ISSN
Volume Title
Publisher
Virginia Tech
Abstract

Complex networks are composed of a large number of interacting nodes. Examples of complex networks include the topology of the Internet, connections between websites or web pages in the World Wide Web (WWW), and connections between participants in social networks.Due to their ubiquity, modeling complex networks is importantfor answering many research questions that cannot be answered without a mathematical model. For example, mathematical models of complex networks can be used to find the most vulnerable nodes to protect during a virus attack in theInternet, to predict connections between websites in the WWW, or to find members of different communities insocial networks. Researchers have analyzed complex networksand concluded that they are distinguished from other networks by four specific statistical properties. These four statistical properties are commonly known in this field as: (i) thesmall world effect,(ii) high average clustering coefficient, (iii) scale-free power law degree distribution, and (iv) emergence of community structure. These four statistical properties are further described later in this dissertation.

Mostmodels used to generate complex networks attempt to produce networks with these statistical properties. Additionally, most of these network models generate homogeneous complex networks where all the networknodes are considered to have the same properties. Homogenous complex networks neglect the heterogeneous nature ofthe nodes in many complexnetworks. Moreover, somemodels proposed for generating heterogeneous complexnetworks are not general as they make specific assumptions about the properties of the network.Including heterogeneity in the connection algorithm of a modelwould makeitmore suitable for generating the subset of complex networks that exhibit selective linking.Additionally, all modelsproposed, to date, for generating heterogeneous complex networks do not preserve all four of the statistical properties of complexnetworks stated above. Thus, formulation of a model for the generation of general heterogeneous complex networkswith characteristics that resemble as much as possible the statistical properties common to the real-world networks that have received attention from the research community is still an open research question.

In this work, we propose two new types of models to generate heterogeneous complex networks. First, we introduce the Integrated Attribute Similarity Model (IASM). IASM uses preferential attachment(PA) to connect nodes based on a similarity measure for node attributes combined with a node's structural popularity measure. IASM integrates the attribute similarity measure and a structural popularity measure in the computation of the connection function used to determine connectionsbetween each arriving (newly created) node and the existing(previously created or old) network nodes. IASM is also the first model known to assign an attribute vector having more than one element to each node, thus allowing different attributes per node in the generated complex network. Networks generated using IASM have a power law degree distribution and preserve the small world phenomenon. IASM models are enhanced to increase their clustering coefficient using a triad formation step (TFS). In a TFS, a node connects to the neighbor of the node to which it was previously connected through preferential attachment, thus forming a triad. The TFS increases the number of triads that are formed in the generated network which increases the network's average clustering coefficient.

We also introduce a second novel model,the Settling Node Adaptive Model (SNAM). SNAM reflects the heterogeneous nature of connectionstandard requirements for nodes. The connectionstandard requirements for a noderefers to the values of attribute similarity and/or structural popularityof old node ythat node new xwould find acceptable in order to connect to node y.SNAM is novel in that such a node connection criterion is not included in any previous model for the generation of complex networks. SNAM is shown to be successful in preserving the power law degree distribution, the small world phenomenon, and the high clustering coefficient of complex networks.

Next,we implement a modification to the IASM and SNAM models that results in the emergence of community structure.Nodes are classified into classes according to their attribute values. The connection algorithm is modified to include the class similarity values between network nodes. This community structure model preservesthe PL degree distribution, small world property, and does not affect average clustering coefficient values expected from both IASM and SNAM. Additionally, the model exhibits the presence of community structure having most of the connections made between nodes belonging to the same class with only a small percent of the connections made between nodes of different classes.

We perform a mathematical analysis of IASM and SNAM to study the degree distribution for networks generated by both models. This mathematical analysis shows that networks generated by both models have a power law degree distribution.

Finally, we completed a case study to illustrate the potential value of our research on the modeling of heterogeneous complex networks. This case study was performed on a Facebook dataset. The case study shows that SNAM, with some modifications to the connection algorithm, is capable of generating a network with almost the same characteristics as found for the original dataset. The case study providesinsight on how the flexibility of SNAM's connection algorithm can be an advantagethat makes SNAM capable of generating networks with different statistical properties.

Ideas for future research areas includestudyingthe effect of using eigenvector centrality, instead of degree centrality, on the emergence of community structure in IASM; usingthe nodeindex as an indication for its order of arrival to the network and distributing added connections fairly among networknodes along the life of the generated network; experimenting with the nature of attributesto generatea more comprehensive model; and usingtime sensitive attributes in the models, where the attribute can change its value with time,

Description
Keywords
Complex networks, Mathematical modeling, Preferential attachment
Citation