Practical - REM (Part 2)

## Install the following package
# install.packages ("relevent")
library (relevent)
library(sna)
library(ggplot2)
library(igraph)
library(RColorBrewer)

Welcome to the practical for social network analysis Part 2. In this part you will apply Relational Event Models (REMs) on real relational event history (REH) data: Apollo 13 voice loop data and Twitter data. These data sets are stored in UUsummerschool.rdata.

1. Relational event model (REM)

A reminder about REMs: dyadic REMs are intended to capture the behavior of systems in which individual social units (persons, organizations, animals, companies, countries, etc.) direct discrete actions towards other individuals in their environment, which form a social network.

As in the Part 1, first, we need to load UUsummerschool.rdata and see what is inside of it.

# Set your own working directory
# setwd()

# Load data using load():
load("UUsummerschool.rdata") 
# This R-data contains 11 objects:
ls()

## [1] "as.sociomatrix.eventlist" "Class"                   
## [3] "ClassIntercept"           "ClassIsFemale"           
## [5] "ClassIsTeacher"           "PartOfApollo_13"         
## [7] "Twitter_data_rem3"        "WTCPoliceCalls"          
## [9] "WTCPoliceIsICR"

let’s check the objects that we will work with in this practical.

as.sociomatrix.eventlist

## function (eventlist, n) 
## {
##     g <- matrix(0, n, n)
##     if (NROW(eventlist) > 0) {
##         tabmat <- table(eventlist[, -1, drop = FALSE])
##         g[as.numeric(dimnames(tabmat)[[1]]), as.numeric(dimnames(tabmat)[[2]])] <- tabmat
##     }
##     g
## }
## attr(,"source")
## [1] "function(eventlist,n){"                                                             
## [2] "  g<-matrix(0,n,n)"                                                                 
## [3] "  if(NROW(eventlist)>0){"                                                           
## [4] "    tabmat<-table(eventlist[,-1,drop=FALSE])"                                       
## [5] "    g[as.numeric(dimnames(tabmat)[[1]]), as.numeric(dimnames(tabmat)[[2]])]<-tabmat"
## [6] "  }"                                                                                
## [7] "  g"                                                                                
## [8] "}"

head(PartOfApollo_13)

##      time sender receiver
## 1 11849.2     18        2
## 2 11854.2      2       18
## 3 11885.2     18        2
## 4 11890.2      2       18
## 5 12232.2      2       17
## 6 12342.2     17        2

head(Twitter_data_rem3)

##   time_day source target
## 1     0.00      8      1
## 2   121.03     28      2
## 3   199.08     28      2
## 4   266.95      4      3
## 5   573.59     22      5
## 6   574.49     25      5

If you need a reminder on what a sociomatrix is, please refer to the Part 1 of REM practical.

2. Apollo 13 data & visualization, centrality measures, and community detection

During this practical we will analyze part of the Apollo 13 mission. The mission launched as scheduled at 2:13:00EST (19:13:00 UTC) on April 11, 1970. On board were James Lovell Jr. (Commander, CDR), John ”Jack” Swigert Jr. (Command Module Pilot, CMP), and Fred Haise Jr. (Lunar Module Pilot, LMP). The mission was quite routine and everything went to plan, except when at 56:54:53, the astronauts heard a ”pretty large bang” and experienced fluctuations in electrical power and control thrusters. This sets a series of events in motion. With oxygen levels depleting fast, the astronauts not only faced a risk of running out of oxygen to breathe. Therefore, they decided to abort the mission and come back to earth. This indeed changed their communications and interactions during the mission, as both the astronauts and mission control had to solve unexpected and urgent problems in order to bring the crew home alive.

The data come from the Apollo 13’s voice loops transcripts, obtained from http://apollo13realtime.org/ and https://history.nasa.gov/afj/ap13fj/07day3-before-the-storm.html; the data include the Flight directors’ voice loop and the air-ground’s voice loop. Flight directors (Houston’s Mission Control Center) were located in Houston and the crew (astronauts) were connected to this control center via Capsule Communicator (CAPCOM). The Apollo 13 data is an ideal benchmark data to study communication/interaction pattern. In this practical we use a part of the data, one hour before incident till one hour after that. The eventlist is stored in an object called PartOfApollo_13 and contain four columns: time, sender, receiver and message. Note that we know precisely when these calls were made. Therefore, we apply REM considering that the the exact event times are known. That should be notified that the number of actors is 19.

Question 1: Convert the Apollo 13 data eventlist into a sociomatrix.
Hint: as.sociomatrix.eventlist(data, number of actors).

# Write your code here.

Question 2: Use graph_from_adjacency_matrix() to convert the Apollo 13 adjacency matrix into a graph object.
Hint: graph_from_adjacency_matrix(sociomatrix matrix).

# Write your code here.

Question 3: Assign corresponding numbers as vertex names.
Hint: You can set vertex attributes by using V(graph)$name_of_attribute.

# Write your code here.

Question 4: Using igraph package visualize the Apollo 13 data using different layouts to see which layout is the most beneficial. First, use a circular layout.
Hint: plot(graph, layout=layout_in_circle). Due to the size of this network it could be beneficial to add weights to the edges to represent the frequency of occurrence of an edge. You can add weights to the edges by using E(your graph)$weight and assigning a 1 to every edge. Then, you need to use simplify(your graph) to collapse all repeated edges into single edges with weights. When you visualize the graph, you can add an argument edge.width=E(your graph)$weight and multiply it by a number (e.g., 0.05) to have the thickness of the edges represent the weights.

# Write your code here.

Now, try a Fruchterman and Reingold method.
Hint: plot(graph, layout=layout_with_fr).

# Write your code here.

Also try the Kamada and Kawai method.
Hint: plot(graph, layout=layout_with_kk).

# Write your code here.

Question 5: What are the size and order of this network?
Hint: gsize(graph), gorder(graph).

# Write your code here.

Question 6: Create degree distributions of the Apollo 13 data.
Hint: degree(graph, mode=c("in", "out", "total")); ggplot(data, aes()) + geom_histogram() + labs(title="", x="", y="").

# Write your code here.

The degree distributions show that many vertices have varying degrees with one node having noticeably higher in-, out-, and total degrees.

Question 7: Calculate centrality measures: degree, betweeness, and eigenvalue (closeness cannot be calculated due to the presence of isolated nodes). Extract nodes that score the highest on each centrality measure.
Hint: degree(graph, ..., mode=c("all", "out", "in", "total")); betweenness(graph, directed = TRUE, ...); eigen_centrality(graph, directed = TRUE, ...).

a) highest degree

# Write your code here.

We can see that node 2 is the most ‘important’ node in this network according to all three degree measures.

b) highest betweenness

# Write your code here.

The node that occurs most often on the shortest path between a pair of other nodes is node 7.

c) highest eigenvector

# Write your code here.

The node with the most connected neighbors is node 2.

Question 8: Apply community detection algorithm on the Appollo 13 data. First, use naive approach. If you need a reminder on community detection, refer to Q11 of Part 1 of the REM practical.
Hint: as.undirected(graph, mode = c("collapse", "each", "mutual"); cluster_fast_greedy(graph).

# Write your code here.

Use length() to check how many communities the algorithm has identified.

# Write your code here.

Use sizes() to see how large those communities are.

# Write your code here.

Plot the results of the community detection using plot(communities, graph).

# Plot clusters.

Naive approach identified two main communities and the rest are just single isolated nodes. After plotting the results we can see that the largest community is clearly centered around node 7.

Create a dendrogram using dendPlot(communities, mode=‘phylo’).

# Plot a dendrogram

According to the dendrogram, the nodes in the clusters appear to be quite similar except nodes 4 and 7, which seem to differ from the rest of the nodes in their cluster. Looking at the graph plot we can see that it is likely related to the fact that nodes 4 and 7 are highly central in their cluster.

3. Project 1: Application of REM on Apollo 13 data

We are again going to use REMs for event histories with exact timing information.
Let’s consider Apollo 13 mission data. Read the description of the data at the beginning of this practical and also check the aforementioned websites for more information. In this case, event time is given in increments of seconds from the onset of observation.

In this data the actors are as follows:

AFD: Assistant Flight Director from Flight directors (1)
CAPCOM: Capsule Communicator from Flight directors (2)
CONTROL: Control Officer from Flight directors (3)
EECOM: Electrical, Environmental and Consumables Manager from Flight directors (4)
All : Ground control team (without flight directores) (5)
FDO : Flight dynamics officer (FDO or FIDO) (6)
FLIGHT: Flight Director from Flight directors (7)
GNC: The Guidance, Navigation, and Controls Systems Engineer from Flight directors (8)
GUIDO: Guidance Officer from Flight directors (9)
INCO: Integrated Communications Officer from Flight directors (10)
NETWORK: Network of ground stations from Flight directors (11)
TELMU: Telemetry, Electrical, and EVA Mobility Unit Officer from Flight directors (12)
RECOVERY: Recovery Supervisor from Flight directorsc (13)
PROCEDURES: Organization & Procedures Officer from Flight directors (14)
FAO: Flight activities officer from Flight directors (15)
RETRO: Retrofire Officer from Flight directors (16)
CDR: Commander James A. Lovell Jr. crew (astronauts) (17)
CMP: Command Module Pilot John (Jack) L. Swigert Jr. crew (astronauts) (18)
LMP: Lunar module pilot Fred W. Haise Jr. crew (astronauts) (19)

Question 9: First look at the data, plot the network and start with fitting a simple model. Next, add the statistics of interest to the model and see the performance of the new model.

# Write your code here.

4. Twitter data & visualization, centrality measures, and community detection.

The data was extracted from the Academic Twitter API in two steps. In the first step, all users being mentioned using the hashtags ‘#ic2s2’ or ‘#’netsciXXXX’ (XXXX = 2010, 2011, … 2022) were extracted. In the second step, all mentions of those users were extracted, up to 800 tweets. Finally, the core of the network was extracted by keeping users with an in-degree and out-degree over K=150 mentions. As there were some overlaps in the time, we modified the data in order to be usable for the REM model. The source or sender is a person tweeting and the target or receiver is the person mentioned. Note that the time was date of tweet, which was converted to day. In this REH data the number of actors is 39.

Question 10: Convert the Twitter data eventlist into a sociomatrix.
Hint: as.sociomatrix.eventlist(data, number of actors).

# Write your code here.

Question 11: Use graph_from_adjacency_matrix() to convert the Twitter adjacency matrix into a graph object.
Hint: graph_from_adjacency_matrix(sociomatrix).

# Write your code here.

Question 12: Assign corresponding numbers as vertex names.
Hint: You can set vertex attributes by using V(graph)$name_of_attribute.

# Write your code here.

Question 13: Using igraph package visualize the Twitter data using different layouts to see which layout is the most beneficial. First, use a circular layout.
Hint: plot(graph, layout=layout_in_circle). Check the hint of Q4 to see how you can add weights to the edges.

# Write your code here.

Now, try a Fruchterman and Reingold method.
Hint: plot(graph, layout=layout_with_fr).

# Write your code here.

Try also the Kamada and Kawai method.
Hint: plot(graph, layout=layout_with_kk).

# Write your code here.

Question 14: What are the size and order of this network?
Hint: gsize(graph), gorder(graph).

# Write your code here.

Question 15: Create degree distributions of the Twitter data.
Hint: degree(graph, mode=c("in", "out", "total")); ggplot(data, aes()) + geom_histogram() + labs(title="", x="", y="").

# Write your code here.

According to the degree distributions, the Twitter network is well connected, with a few nodes being better connected than the rest, and with one node in particular that has nearly 250 total connections.

Question 16: Calculate centrality measures: degree, betweeness, closeness and eigenvalue. Extract nodes that score the highest on each centrality measure.
Hint: degree(graph, ..., mode=c("all", "out", "in", "total")); betweenness(graph, directed = TRUE, ...); closeness(graph, ...); eigen_centrality(graph, directed = TRUE, ...).

a) highest degree

# Write your code here.

Node 4 has the highest in- and total degree, while node 16 has the highest outdegree.

b) highest betweenness

# Write your code here.

Node 20 occurs most often on the shortest path between pairs of other nodes.

c) highest closeness

# Write your code here.

Node 16 is located closest to all other nodes.

d) highest eigenvector

# Write your code here.

Node 4 has the most well-connected neighbors.

Question 17: Apply community detection algorithm on the Twitter data. First, use naive approach.
Hint: as.undirected(graph, mode = c("collapse", "each", "mutual"); cluster_fast_greedy(graph).

# Write your code here.

Use length() to check how many communities the algorithm has identified.

# Write your code here.

Use sizes() to see how large those communities are.

# Write your code here.

Plot the results of the community detection using plot(communities, graph).

# plot clusters

After visually exploring the results of the community detection we can see that the connections seem relatively uniform and there are a lot of edges between the members of different communities. These results are expected given the degree distribution of this network.

Create a dendrogram using dendPlot(communities, mode=‘phylo’).

# plot a dendrogram

According to the dendrogram, all nodes are quite similar to other nodes in their respective communities. In addition, it appears that the heights of the branches pointing to different clusters are also similar suggesting that there isn’t a very clear distinction between the clusters.

# Write your code here.

5. Project 2: Application of REM on Twitter data

Question 18: First, look at the data, then plot the network. Can you see any trends in the network? Start with fitting a REM model and add the statistics of interest sequentially. Can you see any improvement based on the BIC?

Hint: Consider some of these statistics:“PSAB-BA”, “ISPSnd”, “PSAB-BY”, “PSAB-XB”, “NIDSnd”, “NIDRec”, and “NODSnd”.

“PSAB-BA” is a turn receiving effect - a receiver of a communication event immediately returns the communication event back $AB \rightarrow BA$.

“ISPSnd” - an effect where nodes that have a higher number of shared nodes that they have received communication events from increases the likelihood of these nodes communicating in the future.

“PSAB-BY” is a turn receiving effect when a participant receiving a communication event is the one initiating the next communication effect $AB \rightarrow BY$.

“PSAB-XB” is a turn usurping effect where a communication event from A to B is immediately followed by a communication event from X to B $AB \rightarrow XB$.

“NIDSnd” - normalized indegree of a node affects it’s future rate of initiating communication events.

“NIDRec” - normalized indegree of a node affects it’s future rate of receiving communication events.

“NODSnd” - normalized outdegree of a node affects it’s future rate of initiating communication events.

# Write your code here.