## Install the following package
# install.packages ("relevent")
library (relevent)
library(sna)
library(ggplot2)
library(igraph)
library(RColorBrewer)
Welcome to the practical for social network analysis Part 2. In this
part you will apply Relational Event Models (REMs) on real relational
event history (REH) data: Apollo 13 voice loop data and Twitter data.
These data sets are stored in UUsummerschool.rdata
.
A reminder about REMs: dyadic REMs are intended to capture the behavior of systems in which individual social units (persons, organizations, animals, companies, countries, etc.) direct discrete actions towards other individuals in their environment, which form a social network.
As in the Part 1, first, we need to load
UUsummerschool.rdata
and see what is inside of it.
# Set your own working directory
# setwd()
# Load data using load():
load("UUsummerschool.rdata")
# This R-data contains 11 objects:
ls()
## [1] "as.sociomatrix.eventlist" "Class"
## [3] "ClassIntercept" "ClassIsFemale"
## [5] "ClassIsTeacher" "PartOfApollo_13"
## [7] "Twitter_data_rem3" "WTCPoliceCalls"
## [9] "WTCPoliceIsICR"
let’s check the objects that we will work with in this practical.
as.sociomatrix.eventlist
## function (eventlist, n)
## {
## g <- matrix(0, n, n)
## if (NROW(eventlist) > 0) {
## tabmat <- table(eventlist[, -1, drop = FALSE])
## g[as.numeric(dimnames(tabmat)[[1]]), as.numeric(dimnames(tabmat)[[2]])] <- tabmat
## }
## g
## }
## attr(,"source")
## [1] "function(eventlist,n){"
## [2] " g<-matrix(0,n,n)"
## [3] " if(NROW(eventlist)>0){"
## [4] " tabmat<-table(eventlist[,-1,drop=FALSE])"
## [5] " g[as.numeric(dimnames(tabmat)[[1]]), as.numeric(dimnames(tabmat)[[2]])]<-tabmat"
## [6] " }"
## [7] " g"
## [8] "}"
head(PartOfApollo_13)
## time sender receiver
## 1 11849.2 18 2
## 2 11854.2 2 18
## 3 11885.2 18 2
## 4 11890.2 2 18
## 5 12232.2 2 17
## 6 12342.2 17 2
head(Twitter_data_rem3)
## time_day source target
## 1 0.00 8 1
## 2 121.03 28 2
## 3 199.08 28 2
## 4 266.95 4 3
## 5 573.59 22 5
## 6 574.49 25 5
If you need a reminder on what a sociomatrix is, please refer to the Part 1 of REM practical.
During this practical we will analyze part of the Apollo 13 mission. The mission launched as scheduled at 2:13:00EST (19:13:00 UTC) on April 11, 1970. On board were James Lovell Jr. (Commander, CDR), John ”Jack” Swigert Jr. (Command Module Pilot, CMP), and Fred Haise Jr. (Lunar Module Pilot, LMP). The mission was quite routine and everything went to plan, except when at 56:54:53, the astronauts heard a ”pretty large bang” and experienced fluctuations in electrical power and control thrusters. This sets a series of events in motion. With oxygen levels depleting fast, the astronauts not only faced a risk of running out of oxygen to breathe. Therefore, they decided to abort the mission and come back to earth. This indeed changed their communications and interactions during the mission, as both the astronauts and mission control had to solve unexpected and urgent problems in order to bring the crew home alive.
The data come from the Apollo 13’s voice loops transcripts, obtained
from http://apollo13realtime.org/ and https://history.nasa.gov/afj/ap13fj/07day3-before-the-storm.html;
the data include the Flight directors’ voice loop and the air-ground’s
voice loop. Flight directors (Houston’s Mission Control Center) were
located in Houston and the crew (astronauts) were connected to this
control center via Capsule Communicator (CAPCOM). The Apollo 13 data is
an ideal benchmark data to study communication/interaction pattern. In
this practical we use a part of the data, one hour before incident till
one hour after that. The eventlist is stored in an object called
PartOfApollo_13
and contain four columns: time, sender,
receiver and message. Note that we know precisely when these calls were
made. Therefore, we apply REM considering that the the exact event times
are known. That should be notified that the number of actors is 19.
Question 1: Convert the Apollo 13 data eventlist into a
sociomatrix.
Hint: as.sociomatrix.eventlist(data, number of actors)
.
# Write your code here.
Question 2: Use graph_from_adjacency_matrix()
to
convert the Apollo 13 adjacency matrix into a graph
object.
Hint: graph_from_adjacency_matrix(sociomatrix matrix)
.
# Write your code here.
Question 3: Assign corresponding numbers as vertex
names.
Hint: You can set vertex attributes by using
V(graph)$name_of_attribute
.
# Write your code here.
Question 4: Using igraph package visualize the Apollo 13 data
using different layouts to see which layout is the most beneficial.
First, use a circular layout.
Hint: plot(graph, layout=layout_in_circle)
. Due to the size
of this network it could be beneficial to add weights to the edges to
represent the frequency of occurrence of an edge. You can add weights to
the edges by using E(your graph)$weight
and assigning a 1
to every edge. Then, you need to use simplify(your graph)
to collapse all repeated edges into single edges with weights. When you
visualize the graph, you can add an argument
edge.width=E(your graph)$weight
and multiply it by a number
(e.g., 0.05) to have the thickness of the edges represent the
weights.
# Write your code here.
Now, try a Fruchterman and Reingold method.
Hint: plot(graph, layout=layout_with_fr)
.
# Write your code here.
Also try the Kamada and Kawai method.
Hint: plot(graph, layout=layout_with_kk)
.
# Write your code here.
Question 5: What are the size and order of this
network?
Hint: gsize(graph)
, gorder(graph)
.
# Write your code here.
Question 6: Create degree distributions of the Apollo 13
data.
Hint: degree(graph, mode=c("in", "out", "total"))
;
ggplot(data, aes()) + geom_histogram() + labs(title="", x="", y="")
.
# Write your code here.
The degree distributions show that many vertices have varying degrees with one node having noticeably higher in-, out-, and total degrees.
Question 7: Calculate centrality measures: degree,
betweeness, and eigenvalue (closeness cannot be calculated due to the
presence of isolated nodes). Extract nodes that score the highest on
each centrality measure.
Hint:
degree(graph, ..., mode=c("all", "out", "in", "total"))
;
betweenness(graph, directed = TRUE, ...)
;
eigen_centrality(graph, directed = TRUE, ...)
.
a) highest degree
# Write your code here.
We can see that node 2 is the most ‘important’ node in this network according to all three degree measures.
b) highest betweenness
# Write your code here.
The node that occurs most often on the shortest path between a pair of other nodes is node 7.
c) highest eigenvector
# Write your code here.
The node with the most connected neighbors is node 2.
Question 8: Apply community detection algorithm on the
Appollo 13 data. First, use naive approach. If you need a reminder on
community detection, refer to Q11 of Part 1 of the REM practical.
Hint:
as.undirected(graph, mode = c("collapse", "each", "mutual")
;
cluster_fast_greedy(graph)
.
# Write your code here.
Use length()
to check how many communities the algorithm
has identified.
# Write your code here.
Use sizes()
to see how large those communities are.
# Write your code here.
Plot the results of the community detection using
plot(communities, graph)
.
# Plot clusters.
Naive approach identified two main communities and the rest are just single isolated nodes. After plotting the results we can see that the largest community is clearly centered around node 7.
Create a dendrogram using
dendPlot(communities, mode=‘phylo’)
.
# Plot a dendrogram
According to the dendrogram, the nodes in the clusters appear to be quite similar except nodes 4 and 7, which seem to differ from the rest of the nodes in their cluster. Looking at the graph plot we can see that it is likely related to the fact that nodes 4 and 7 are highly central in their cluster.
We are again going to use REMs for event histories with exact timing
information.
Let’s consider Apollo 13 mission data. Read the description of the data
at the beginning of this practical and also check the aforementioned
websites for more information. In this case, event time is given in
increments of seconds from the onset of observation.
In this data the actors are as follows:
AFD: Assistant Flight Director from Flight directors (1)
CAPCOM: Capsule Communicator from Flight directors (2)
CONTROL: Control Officer from Flight directors (3)
EECOM: Electrical, Environmental and Consumables Manager from Flight directors (4)
All : Ground control team (without flight directores) (5)
FDO : Flight dynamics officer (FDO or FIDO) (6)
FLIGHT: Flight Director from Flight directors (7)
GNC: The Guidance, Navigation, and Controls Systems Engineer from Flight directors (8)
GUIDO: Guidance Officer from Flight directors (9)
INCO: Integrated Communications Officer from Flight directors (10)
NETWORK: Network of ground stations from Flight directors (11)
TELMU: Telemetry, Electrical, and EVA Mobility Unit Officer from Flight directors (12)
RECOVERY: Recovery Supervisor from Flight directorsc (13)
PROCEDURES: Organization & Procedures Officer from Flight directors (14)
FAO: Flight activities officer from Flight directors (15)
RETRO: Retrofire Officer from Flight directors (16)
CDR: Commander James A. Lovell Jr. crew (astronauts) (17)
CMP: Command Module Pilot John (Jack) L. Swigert Jr. crew (astronauts) (18)
LMP: Lunar module pilot Fred W. Haise Jr. crew (astronauts) (19)
Question 9: First look at the data, plot the network and start with fitting a simple model. Next, add the statistics of interest to the model and see the performance of the new model.
# Write your code here.
The data was extracted from the Academic Twitter API in two steps. In the first step, all users being mentioned using the hashtags ‘#ic2s2’ or ‘#’netsciXXXX’ (XXXX = 2010, 2011, … 2022) were extracted. In the second step, all mentions of those users were extracted, up to 800 tweets. Finally, the core of the network was extracted by keeping users with an in-degree and out-degree over K=150 mentions. As there were some overlaps in the time, we modified the data in order to be usable for the REM model. The source or sender is a person tweeting and the target or receiver is the person mentioned. Note that the time was date of tweet, which was converted to day. In this REH data the number of actors is 39.
Question 10: Convert the Twitter data eventlist into a
sociomatrix.
Hint: as.sociomatrix.eventlist(data, number of actors)
.
# Write your code here.
Question 11: Use graph_from_adjacency_matrix()
to convert the Twitter adjacency matrix into a graph
object.
Hint: graph_from_adjacency_matrix(sociomatrix)
.
# Write your code here.
Question 12: Assign corresponding numbers as vertex
names.
Hint: You can set vertex attributes by using
V(graph)$name_of_attribute
.
# Write your code here.
Question 13: Using igraph package visualize the Twitter data
using different layouts to see which layout is the most beneficial.
First, use a circular layout.
Hint: plot(graph, layout=layout_in_circle)
. Check the hint
of Q4 to see how you can add weights to the edges.
# Write your code here.
Now, try a Fruchterman and Reingold method.
Hint: plot(graph, layout=layout_with_fr)
.
# Write your code here.
Try also the Kamada and Kawai method.
Hint: plot(graph, layout=layout_with_kk)
.
# Write your code here.
Question 14: What are the size and order of this
network?
Hint: gsize(graph)
, gorder(graph)
.
# Write your code here.
Question 15: Create degree distributions of the Twitter
data.
Hint: degree(graph, mode=c("in", "out", "total"))
;
ggplot(data, aes()) + geom_histogram() + labs(title="", x="", y="")
.
# Write your code here.
According to the degree distributions, the Twitter network is well connected, with a few nodes being better connected than the rest, and with one node in particular that has nearly 250 total connections.
Question 16: Calculate centrality measures: degree,
betweeness, closeness and eigenvalue. Extract nodes that score the
highest on each centrality measure.
Hint:
degree(graph, ..., mode=c("all", "out", "in", "total"))
;
betweenness(graph, directed = TRUE, ...)
;
closeness(graph, ...)
;
eigen_centrality(graph, directed = TRUE, ...)
.
a) highest degree
# Write your code here.
Node 4 has the highest in- and total degree, while node 16 has the highest outdegree.
b) highest betweenness
# Write your code here.
Node 20 occurs most often on the shortest path between pairs of other nodes.
c) highest closeness
# Write your code here.
Node 16 is located closest to all other nodes.
d) highest eigenvector
# Write your code here.
Node 4 has the most well-connected neighbors.
Question 17: Apply community detection algorithm on the
Twitter data. First, use naive approach.
Hint:
as.undirected(graph, mode = c("collapse", "each", "mutual")
;
cluster_fast_greedy(graph)
.
# Write your code here.
Use length()
to check how many communities the algorithm
has identified.
# Write your code here.
Use sizes()
to see how large those communities are.
# Write your code here.
Plot the results of the community detection using
plot(communities, graph)
.
# plot clusters
After visually exploring the results of the community detection we can see that the connections seem relatively uniform and there are a lot of edges between the members of different communities. These results are expected given the degree distribution of this network.
Create a dendrogram using
dendPlot(communities, mode=‘phylo’)
.
# plot a dendrogram
According to the dendrogram, all nodes are quite similar to other nodes in their respective communities. In addition, it appears that the heights of the branches pointing to different clusters are also similar suggesting that there isn’t a very clear distinction between the clusters.
# Write your code here.
Question 18: First, look at the data, then plot the network. Can you see any trends in the network? Start with fitting a REM model and add the statistics of interest sequentially. Can you see any improvement based on the BIC?
Hint: Consider some of these statistics:“PSAB-BA”, “ISPSnd”, “PSAB-BY”, “PSAB-XB”, “NIDSnd”, “NIDRec”, and “NODSnd”.
“PSAB-BA” is a turn receiving effect - a receiver of a communication
event immediately returns the communication event back \(AB \rightarrow BA\).
“ISPSnd” - an effect where nodes that have a higher number of shared
nodes that they have received communication events from increases the
likelihood of these nodes communicating in the future.
“PSAB-BY” is a turn receiving effect when a participant receiving a
communication event is the one initiating the next communication effect
\(AB \rightarrow BY\).
“PSAB-XB” is a turn usurping effect where a communication event from
A to B is immediately followed by a communication event from X to B
\(AB \rightarrow XB\).
“NIDSnd” - normalized indegree of a node affects it’s future rate of
initiating communication events.
“NIDRec” - normalized indegree of a node affects it’s future rate of
receiving communication events.
“NODSnd” - normalized outdegree of a node affects it’s future rate of
initiating communication events.
# Write your code here.