Network Science for Fun and Profit

John L. Taylor

Goals

  • Provide a basic conceptual structure for networks.
  • Review network basics to give context to methods and models.
  • Demonstrate a range of network analysis methods.
  • Present a few uses of network model.
  • Provide further resources and code for future use.

Outline

Part 1: Network Science Motivation

Part 2: Network Science Basics

  • Graph Types and Representations
  • Generative Models
  • Graph Analysis

Part 3: Doing Things with Networks

  • Community Detection
  • Network Diffusion
  • Next Steps

Network Science Motivation

  • What is Network Science?
  • Uses of Networks
  • Problem Context

What is Network Science?

Network science is an interdisciplinary field which studies complex networks.

A complex network is a graph with non-trivial topological features- features that do not occur in simple networks such as lattices or random graphs, but often occur in real graphs.

plot of chunk unnamed-chunk-2

plot of chunk unnamed-chunk-3

The Grand Vision

The burgeoning field of computer science has shifted our view of the physical world from that of a collection of interacting material particles to one of a seething network of information. In this way of looking at nature, the laws of physics are a form of software, or algorithm, while the material world-the hardware-plays the role of a gigantic computer.

  • P.C.W. Davies, from 'Laying Down the Laws', New Scientist. In Clifford A. Pickover, Archimedes to Hawking: Laws of Science and the Great Minds Behind Them (2008), 183.

Types of Complex Network Models

Technological Networks

  • The Internet
  • Power Grids

Social Networks

  • Affiliation Networks
  • Friendship Networks

Information Networks

  • The World Wide Web
  • Citation Networks

Biological Networks

  • Metabolic Networks
  • Neural Networks
  • Food Networks

Defining Graphs (Networks)

A graph is composed of vertexes and edges: G = (V, E). The vertexes represent things and the edges pair-wise relationships.

Network Type Vertex Edge
The Internet Technological Computer/Router Cable/Wireless
Power Grid Technological Generating Station/Substation Transmission Line
Affiliation Network Social Person or Group Membership
Friendship Network Social Person Friendship
WWW Information Web Page Link
Citation Networks Information Article Citation
Metabolic Networks Biological Metabolite Metabolic Reaction
Neural Networks Biological Neuron Synapse
Food Networks Biological Species Predation

Modes of Study in Complex Networks

Complex networks may be considered in the following ways:

  • Mathematical Objects: Graph Theory/Network Theory.
  • Models: Network Science.
    • Structural: Where graph structure is the target of modeling (partitioning, circuits, paths).
    • Mechanistic: Where graphs are a mechanism of some process (e.g. synchrony, diffusion, inference).

Why Does Network Science Matter?

Networks model significant features of complex distributed systems, leading to new ways to predict, explain and influence natural and artificial systems.

Network Science + ??? = FUN & PROFIT!

alt text alt text

Network Science Basics

  • Graph Types and Representations
  • Generative Models
  • Graph Analysis

Graph Types and Representations

  • Simple graphs are graphs without multiple edges or self-loops.
  • Directed graphs (digraphs) have edges with directions (e.g citation network, WWW).
  • Weighted graphs have an associated weight on each edge (e.g. food network with proportion of predation on edge, ARD with count of txns on edge).
  • Bipartite graphs have two types of vertex that may only connect to a differing type (e.g. affiliation network, ARD).

Simple Graph

plot of chunk unnamed-chunk-4

Simple Graph Representation

The simple graph is composed of vertices and edges that can be represented as an edge list:

E(g)
Edge sequence:

[1] B -- A
[2] C -- B

Or an adjacency matrix which represents which vertices are adjacent to which other vertices.

get.adjacency(g)
3 x 3 sparse Matrix of class "dgCMatrix"
  A B C
A . 1 .
B 1 . 1
C . 1 .

Directed Graph

plot of chunk unnamed-chunk-7

Directed Graph Representation

The edge sequence is directed in the following way:

E(g)
Edge sequence:

[1] A -> B
[2] B -> C

Note the asymmetry of the adjacency matrix:

get.adjacency(g)
3 x 3 sparse Matrix of class "dgCMatrix"
  A B C
A . 1 .
B . . 1
C . . .

Weighted Graph

plot of chunk unnamed-chunk-10

Weighted Graph Representation

In addition to the edge sequence structure, the weighted graph has a vector of weights applied to its edges:

E(g)$weight
[1] 6 3 1

Bipartite Graph

plot of chunk unnamed-chunk-12

Bipartite Graph Edge Sequence

Edge sequence:

E(bg)
Edge sequence:

[1]  Group1 -- A     
[2]  Group2 -- A     
[3]  Group1 -- B     
[4]  Group3 -- B     
[5]  Group1 -- C     
[6]  Group3 -- C     
[7]  Group2 -- D     
[8]  Group4 -- D     
[9]  Group3 -- E     
[10] Group4 -- E     

Bipartite Graph Adjacency Matrix

Adjacency matrix (V x V):

get.adjacency(bg) 
9 x 9 sparse Matrix of class "dgCMatrix"
       A B C D E Group1 Group2 Group3 Group4
A      . . . . .      1      1      .      .
B      . . . . .      1      .      1      .
C      . . . . .      1      .      1      .
D      . . . . .      .      1      .      1
E      . . . . .      .      .      1      1
Group1 1 1 1 . .      .      .      .      .
Group2 1 . . 1 .      .      .      .      .
Group3 . 1 1 . 1      .      .      .      .
Group4 . . . 1 1      .      .      .      .

Bipartite Graph Incidence Matrix

An incidence matrix represents the relationship between two classes of objects: V x G

get.incidence(bg)
  Group1 Group2 Group3 Group4
A      1      1      0      0
B      1      0      1      0
C      1      0      1      0
D      0      1      0      1
E      0      0      1      1

Bipartite Graph: Projection 1

plot of chunk unnamed-chunk-16

Bipartite Graph: Projection 2

plot of chunk unnamed-chunk-17

Kinds of Graphs: By Underlying Process

  • Random graphs are the result of vertexes associating with one another randomly and with equal chance (e.g. “buttons and Strings”).
  • Scale-free network: are the result of preferential attachment or fitness processes, where some vertexes accumulate the majority of associations as a result of having many associations (e.g. WWW).
  • Small-world network are the result of processes where vertexes are more likely to be associated with vertexes that share common associations i.e. “friends of friends” (e.g. friend networks).
  • n-Regular network are the result of a process that creates a repeating structure (e.g. lattice).

Generative Processes: Random

plot of chunk unnamed-chunk-18

Generative Processes: Scale-Free

plot of chunk unnamed-chunk-19

Generative Processes: Small-Worlds

plot of chunk unnamed-chunk-20

Generative Processes: Regular

plot of chunk unnamed-chunk-21

Graph Analysis

There are three primary levels at which a graph may be analyzed:

  • Graph Level: Summary descriptions and statistics of an entire graph.
  • Component Level: Descriptions of subgraphs.
  • Node Level: Properties of individual nodes.

Analyzing Graphs: Florentine Network

This is a data set of Padgett (1994), consisting of weddings among leading Florentine families in Renaissance Florence (1282-1500).

data(flo)
g=graph.adjacency(flo,mode="undirected"
                  ,weighted=NULL)
m <- as.matrix(flo)
glayout <- layout.fruchterman.reingold(g)

Analyzing Graphs: Florentine Graph

plot of chunk unnamed-chunk-23

Graph Level: Summary

summary(g)
IGRAPH UN-- 16 20 -- 
attr: name (v/c)

Graph Level: Density

Graph density measures how many edges are in set E compared to the maximum possible number of edges between vertices in set V. Low density means lo connectivity, while higher density means higher connectivity. Dense graphs may be more resistant to link failures, while communicating information, disease, etc. more readily.

graph.density(g)
[1] 0.1667

Graph Level: Hub Degree

The hub degree is the degree of the highest-degree node/s.

detach("package:sna", unload=TRUE)

max(degree(g))
[1] 6
library("sna")

Average Path Length

Average path length is defined as the average number of steps along the shortest paths for all possible pairs of network.

average.path.length(g)
[1] 2.486

Diameter

The diameter of a graph is the length of the shortest path between the most distanced nodes.

diameter(g)
[1] 5

Graph Level: Clustering Coefficient

The clustering coefficient is a measure of the degree to which nodes in a graph tend to cluster together. This is a closely related calculation called the transitivity ratio, which is the number of closed triplets in a graph over the number of connected triples.

clustering_coef<-transitivity(g, type="global")
clustering_coef
[1] 0.1915

Graph Level: Degree Distribution

The degree distribution is the probability distribution of these degrees over the whole network, meaning it represents the likelihood of a randomly selected node to have a given number of degrees.

Graph Level: Degree Distribution Plot

plot of chunk unnamed-chunk-30

Graph Level: Generative Models Revisited

Network Random Scale-Free Small-World n-Regular
Density .040 .039 .040 .040
Hub Degree 10 40 6 4
Avg. Path Length 3.318 2.749 4.755 12.878
Diameter 6 5 10 25
Clust. Coef. .036 .063 .330 .5

Component Level: Cliques

Cliques in an undirected graph are subsets of its vertices such that every two vertices in the subset are connected by an edge. In other words, cliques are complete subgraphs of a network.

These sorts of structures are particularly useful in identifying communities.

Component Level: Clique Graph

plot of chunk unnamed-chunk-32

Node Level: Centrality

A few of the most significant centrality measures include:

  • Degree is just the degree of a vertex, the number of edges connected to it.
  • Closeness measures the mean distance from a vertex to other vertices.
  • Betweeness quantifies the number of times a node acts as a bridge along the shortest path between two other nodes.

Centrality: The Florentine Network

plot of chunk unnamed-chunk-33

Centrality: Degree

plot of chunk unnamed-chunk-34

Centrality: Betweenness

plot of chunk unnamed-chunk-35

Centrality: Closeness

plot of chunk unnamed-chunk-36

Doing Things with Networks

  • Community Detection
  • Network Diffusion
  • Next Steps

Community Detection

A network is said to have community structure if the nodes of the network can be easily grouped into (potentially overlapping) sets of nodes such that each set of nodes is densely connected internally.

Some networks may not have any meaningful community structure, such as the random graphs and Barabasi-Albert scale-free graphs.

Community Detection: The Karate Group Division

Social network between members of a university karate club, led by president John A. and karate instructor Mr. Hi (pseudonyms).

Wayne W. Zachary studied conflict and fission in this network, as the karate club was split into two separate clubs, after long disputes between two factions of the club, one led by John A., the other by Mr. Hi.

Community Detection: The Karate Network

plot of chunk unnamed-chunk-38

Community Detection: Factions

plot of chunk unnamed-chunk-39

Community Detection: Walktrap & Faction

plot of chunk unnamed-chunk-40 Random walks made to determine how likely one will stay within group.

Network Diffusion

Network diffusion is the process whereby states are spread over networks. These states may be informational, viral, failures or otherwise.

There are many models of diffusion, including the SI and SIR models.

Network Diffusion: The SI Model

The Susceptible-Infected Model works simply. Given a network g: g(V, E):

  1. Randomly select one or n nodes as seeds.
  2. At each step, each infected node infects its neighbors with probability p (transmission rate, beta).
  3. Steps are repeated until total infection.

Network Diffusion: SI on Small-World

Network Diffusion: SI on Scale-Free Network

Network Diffusion: The SIR Model

  1. Randomly select one or n nodes as seeds.
  2. At each step, each infected node infects its neighbors at some rate (beta) and the nodes recover at some rate (gamma).
  3. Steps are repeated until infection clears the network.

Network Diffusion: SIR on Scale Free

plot of chunk unnamed-chunk-41

Network Diffusion: The SIR Model on Small World

plot of chunk unnamed-chunk-42

Next Steps

  • Books
  • Software and Packages
  • Data Sources

Books

Popular

  • Linked: How Everything Is Connected to Everything Else… by Albert-Laszlo Barabasi
  • Six Degrees: The Science of a Connected Age by Duncan J. Watts

Introductory

  • Networks: An Introduction by Mark Newman

Technical

  • The Structure and Dynamics of Networks by Mark Newman, Albert-László Barabási, Duncan J. Watts
  • Network Science: Theory and Applications by Ted G. Lewis
  • Statistical Analysis of Network Data: Methods and Models (Springer Series in Statistics) by Eric D. Kolaczyk

Software and Packages

  • Gephi: Network visualization tool.
  • NodeXL: Network analysis in Excel.
  • Python
    • igraph: General library for networks (a bit of a pain to get working).
    • networkx: General library for networks.
  • R
    • igraph: All-around package for networks.
    • sna: Social network analysis package.
    • statnet (includes sna): Statistical analysis package for networks.

Data Sources

Thank You

alt text