Network Science for Fun and Profit

John L. Taylor

Senior Data Scientist, iovation
johnnylogic@gmail.com
www.johnnylogic.org

Goals

Provide a basic conceptual structure for networks.
Review network basics to give context to methods and models.
Demonstrate a range of network analysis methods.
Present a few uses of network model.
Provide further resources and code for future use.

Outline

Part 1: Network Science Motivation

Part 2: Network Science Basics

Graph Types and Representations
Generative Models
Graph Analysis

Part 3: Doing Things with Networks

Community Detection
Network Diffusion
Next Steps

Network Science Motivation

What is Network Science?
Uses of Networks
Problem Context

What is Network Science?

Network science is an interdisciplinary field which studies complex networks.

A complex network is a graph with non-trivial topological features- features that do not occur in simple networks such as lattices or random graphs, but often occur in real graphs.

plot of chunk unnamed-chunk-2

plot of chunk unnamed-chunk-3

The Grand Vision

The burgeoning field of computer science has shifted our view of the physical world from that of a collection of interacting material particles to one of a seething network of information. In this way of looking at nature, the laws of physics are a form of software, or algorithm, while the material world-the hardware-plays the role of a gigantic computer.

P.C.W. Davies, from 'Laying Down the Laws', New Scientist. In Clifford A. Pickover, Archimedes to Hawking: Laws of Science and the Great Minds Behind Them (2008), 183.

Types of Complex Network Models

Technological Networks

The Internet
Power Grids

Social Networks

Affiliation Networks
Friendship Networks

Information Networks

The World Wide Web
Citation Networks

Biological Networks

Metabolic Networks
Neural Networks
Food Networks

Defining Graphs (Networks)

A graph is composed of vertexes and edges: G = (V, E). The vertexes represent things and the edges pair-wise relationships.

Network	Type	Vertex	Edge
The Internet	Technological	Computer/Router	Cable/Wireless
Power Grid	Technological	Generating Station/Substation	Transmission Line
Affiliation Network	Social	Person or Group	Membership
Friendship Network	Social	Person	Friendship
WWW	Information	Web Page	Link
Citation Networks	Information	Article	Citation
Metabolic Networks	Biological	Metabolite	Metabolic Reaction
Neural Networks	Biological	Neuron	Synapse
Food Networks	Biological	Species	Predation

Modes of Study in Complex Networks

Complex networks may be considered in the following ways:

Mathematical Objects: Graph Theory/Network Theory.
Models: Network Science.
- Structural: Where graph structure is the target of modeling (partitioning, circuits, paths).
- Mechanistic: Where graphs are a mechanism of some process (e.g. synchrony, diffusion, inference).

Why Does Network Science Matter?

Networks model significant features of complex distributed systems, leading to new ways to predict, explain and influence natural and artificial systems.

Network Science + ??? = FUN & PROFIT!

alt text

Network Science Basics

Graph Types and Representations
Generative Models
Graph Analysis

Graph Types and Representations

Simple graphs are graphs without multiple edges or self-loops.
Directed graphs (digraphs) have edges with directions (e.g citation network, WWW).
Weighted graphs have an associated weight on each edge (e.g. food network with proportion of predation on edge, ARD with count of txns on edge).
Bipartite graphs have two types of vertex that may only connect to a differing type (e.g. affiliation network, ARD).

Simple Graph

plot of chunk unnamed-chunk-4

Simple Graph Representation

The simple graph is composed of vertices and edges that can be represented as an edge list:

E(g)

Edge sequence:

[1] B -- A
[2] C -- B

Or an adjacency matrix which represents which vertices are adjacent to which other vertices.

get.adjacency(g)

3 x 3 sparse Matrix of class "dgCMatrix"
  A B C
A . 1 .
B 1 . 1
C . 1 .

Directed Graph

plot of chunk unnamed-chunk-7

Directed Graph Representation

The edge sequence is directed in the following way:

E(g)

Edge sequence:

[1] A -> B
[2] B -> C

Note the asymmetry of the adjacency matrix:

get.adjacency(g)

3 x 3 sparse Matrix of class "dgCMatrix"
  A B C
A . 1 .
B . . 1
C . . .

Weighted Graph

Weighted Graph Representation

In addition to the edge sequence structure, the weighted graph has a vector of weights applied to its edges:

E(g)$weight

[1] 6 3 1

Bipartite Graph

plot of chunk unnamed-chunk-12

Bipartite Graph Edge Sequence

Edge sequence:

E(bg)

Edge sequence:

[1]  Group1 -- A     
[2]  Group2 -- A     
[3]  Group1 -- B     
[4]  Group3 -- B     
[5]  Group1 -- C     
[6]  Group3 -- C     
[7]  Group2 -- D     
[8]  Group4 -- D     
[9]  Group3 -- E     
[10] Group4 -- E

Bipartite Graph Adjacency Matrix

Adjacency matrix (V x V):

get.adjacency(bg)

9 x 9 sparse Matrix of class "dgCMatrix"
       A B C D E Group1 Group2 Group3 Group4
A      . . . . .      1      1      .      .
B      . . . . .      1      .      1      .
C      . . . . .      1      .      1      .
D      . . . . .      .      1      .      1
E      . . . . .      .      .      1      1
Group1 1 1 1 . .      .      .      .      .
Group2 1 . . 1 .      .      .      .      .
Group3 . 1 1 . 1      .      .      .      .
Group4 . . . 1 1      .      .      .      .

Bipartite Graph Incidence Matrix

An incidence matrix represents the relationship between two classes of objects: V x G

get.incidence(bg)

  Group1 Group2 Group3 Group4
A      1      1      0      0
B      1      0      1      0
C      1      0      1      0
D      0      1      0      1
E      0      0      1      1

Bipartite Graph: Projection 1

plot of chunk unnamed-chunk-16

Bipartite Graph: Projection 2

plot of chunk unnamed-chunk-17

Kinds of Graphs: By Underlying Process

Random graphs are the result of vertexes associating with one another randomly and with equal chance (e.g. “buttons and Strings”).
Scale-free network: are the result of preferential attachment or fitness processes, where some vertexes accumulate the majority of associations as a result of having many associations (e.g. WWW).
Small-world network are the result of processes where vertexes are more likely to be associated with vertexes that share common associations i.e. “friends of friends” (e.g. friend networks).
n-Regular network are the result of a process that creates a repeating structure (e.g. lattice).

Generative Processes: Random

plot of chunk unnamed-chunk-18

Generative Processes: Scale-Free

plot of chunk unnamed-chunk-19

Generative Processes: Small-Worlds

plot of chunk unnamed-chunk-20

Generative Processes: Regular

plot of chunk unnamed-chunk-21

Graph Analysis

There are three primary levels at which a graph may be analyzed:

Graph Level: Summary descriptions and statistics of an entire graph.
Component Level: Descriptions of subgraphs.
Node Level: Properties of individual nodes.

Analyzing Graphs: Florentine Network

This is a data set of Padgett (1994), consisting of weddings among leading Florentine families in Renaissance Florence (1282-1500).

data(flo)
g=graph.adjacency(flo,mode="undirected"
                  ,weighted=NULL)
m <- as.matrix(flo)
glayout <- layout.fruchterman.reingold(g)

Analyzing Graphs: Florentine Graph

plot of chunk unnamed-chunk-23

Graph Level: Summary

summary(g)

IGRAPH UN-- 16 20 -- 
attr: name (v/c)

Graph Level: Density

Graph density measures how many edges are in set E compared to the maximum possible number of edges between vertices in set V. Low density means lo connectivity, while higher density means higher connectivity. Dense graphs may be more resistant to link failures, while communicating information, disease, etc. more readily.

graph.density(g)

[1] 0.1667

Graph Level: Hub Degree

The hub degree is the degree of the highest-degree node/s.

detach("package:sna", unload=TRUE)

max(degree(g))

[1] 6

library("sna")

Average Path Length

Average path length is defined as the average number of steps along the shortest paths for all possible pairs of network.

average.path.length(g)

[1] 2.486

Diameter

The diameter of a graph is the length of the shortest path between the most distanced nodes.

diameter(g)

[1] 5

Graph Level: Clustering Coefficient

The clustering coefficient is a measure of the degree to which nodes in a graph tend to cluster together. This is a closely related calculation called the transitivity ratio, which is the number of closed triplets in a graph over the number of connected triples.

clustering_coef<-transitivity(g, type="global")
clustering_coef

[1] 0.1915

Graph Level: Degree Distribution

The degree distribution is the probability distribution of these degrees over the whole network, meaning it represents the likelihood of a randomly selected node to have a given number of degrees.

Graph Level: Degree Distribution Plot

plot of chunk unnamed-chunk-30

Graph Level: Generative Models Revisited

Network	Random	Scale-Free	Small-World	n-Regular
Density	.040	.039	.040	.040
Hub Degree	10	40	6	4
Avg. Path Length	3.318	2.749	4.755	12.878
Diameter	6	5	10	25
Clust. Coef.	.036	.063	.330	.5

Component Level: Cliques

Cliques in an undirected graph are subsets of its vertices such that every two vertices in the subset are connected by an edge. In other words, cliques are complete subgraphs of a network.

These sorts of structures are particularly useful in identifying communities.

Component Level: Clique Graph

plot of chunk unnamed-chunk-32

Node Level: Centrality

A few of the most significant centrality measures include:

Degree is just the degree of a vertex, the number of edges connected to it.
Closeness measures the mean distance from a vertex to other vertices.
Betweeness quantifies the number of times a node acts as a bridge along the shortest path between two other nodes.

Centrality: The Florentine Network

plot of chunk unnamed-chunk-33

Centrality: Degree

plot of chunk unnamed-chunk-34

Centrality: Betweenness

plot of chunk unnamed-chunk-35

Centrality: Closeness

plot of chunk unnamed-chunk-36

Doing Things with Networks

Community Detection
Network Diffusion
Next Steps

Community Detection

A network is said to have community structure if the nodes of the network can be easily grouped into (potentially overlapping) sets of nodes such that each set of nodes is densely connected internally.

Some networks may not have any meaningful community structure, such as the random graphs and Barabasi-Albert scale-free graphs.

Community Detection: The Karate Group Division

Social network between members of a university karate club, led by president John A. and karate instructor Mr. Hi (pseudonyms).

Wayne W. Zachary studied conflict and fission in this network, as the karate club was split into two separate clubs, after long disputes between two factions of the club, one led by John A., the other by Mr. Hi.

Community Detection: The Karate Network

plot of chunk unnamed-chunk-38

Community Detection: Factions

plot of chunk unnamed-chunk-39

Community Detection: Walktrap & Faction

plot of chunk unnamed-chunk-40 Random walks made to determine how likely one will stay within group.

Network Diffusion

Network diffusion is the process whereby states are spread over networks. These states may be informational, viral, failures or otherwise.

There are many models of diffusion, including the SI and SIR models.

Network Diffusion: The SI Model

The Susceptible-Infected Model works simply. Given a network g: g(V, E):

Randomly select one or n nodes as seeds.
At each step, each infected node infects its neighbors with probability p (transmission rate, beta).
Steps are repeated until total infection.

Network Diffusion: SI on Small-World

alt text

From http://chengjun.github.io/en/2014/03/simulate-network-diffusion-with-R/

Network Diffusion: SI on Scale-Free Network

alt text

From http://chengjun.github.io/en/2014/03/simulate-network-diffusion-with-R/

Network Diffusion: The SIR Model

Randomly select one or n nodes as seeds.
At each step, each infected node infects its neighbors at some rate (beta) and the nodes recover at some rate (gamma).
Steps are repeated until infection clears the network.

Network Diffusion: SIR on Scale Free

plot of chunk unnamed-chunk-41

Network Diffusion: The SIR Model on Small World

plot of chunk unnamed-chunk-42

Next Steps

Books
Software and Packages
Data Sources

Books

Introductory

Networks: An Introduction by Mark Newman

Technical

The Structure and Dynamics of Networks by Mark Newman, Albert-László Barabási, Duncan J. Watts
Network Science: Theory and Applications by Ted G. Lewis
Statistical Analysis of Network Data: Methods and Models (Springer Series in Statistics) by Eric D. Kolaczyk

Software and Packages

Gephi: Network visualization tool.
NodeXL: Network analysis in Excel.
Python
- igraph: General library for networks (a bit of a pain to get working).
- networkx: General library for networks.
R
- igraph: All-around package for networks.
- sna: Social network analysis package.
- statnet (includes sna): Statistical analysis package for networks.

Data Sources

http://snap.stanford.edu/data/
http://moreno.ss.uci.edu/data.html
http://www-personal.umich.edu/~mejn/netdata/
http://cnets.indiana.edu/groups/nan/truthy/
igraphdata package has ready-to-use igraphs
Twitter and Facebook APIs

Thank You

alt text

Network Science for Fun and Profit

Goals

Outline

Part 1: Network Science Motivation

Part 2: Network Science Basics

Part 3: Doing Things with Networks

Network Science Motivation

What is Network Science?

The Grand Vision

Types of Complex Network Models

Technological Networks

Social Networks

Information Networks

Biological Networks

Defining Graphs (Networks)

Modes of Study in Complex Networks

Why Does Network Science Matter?

Network Science Basics

Graph Types and Representations

Simple Graph

Simple Graph Representation

Directed Graph

Directed Graph Representation

Weighted Graph

Weighted Graph Representation

Bipartite Graph

Bipartite Graph Edge Sequence

Bipartite Graph Adjacency Matrix

Bipartite Graph Incidence Matrix

Bipartite Graph: Projection 1

Bipartite Graph: Projection 2

Kinds of Graphs: By Underlying Process

Generative Processes: Random

Generative Processes: Scale-Free

Generative Processes: Small-Worlds

Generative Processes: Regular

Graph Analysis

Analyzing Graphs: Florentine Network

Analyzing Graphs: Florentine Graph

Graph Level: Summary

Graph Level: Density

Graph Level: Hub Degree

Average Path Length

Diameter

Graph Level: Clustering Coefficient

Graph Level: Degree Distribution

Graph Level: Degree Distribution Plot

Graph Level: Generative Models Revisited

Component Level: Cliques

Component Level: Clique Graph

Node Level: Centrality

Centrality: The Florentine Network

Centrality: Degree

Centrality: Betweenness

Centrality: Closeness

Doing Things with Networks

Community Detection

Community Detection: The Karate Group Division

Community Detection: The Karate Network

Community Detection: Factions

Community Detection: Walktrap & Faction

Network Diffusion

Network Diffusion: The SI Model

Network Diffusion: SI on Small-World

Network Diffusion: SI on Scale-Free Network

Network Diffusion: The SIR Model

Network Diffusion: SIR on Scale Free

Network Diffusion: The SIR Model on Small World

Next Steps

Books

Popular

Introductory

Technical

Software and Packages

Data Sources

Thank You