## Install the following package
# install.packages ("relevent")
library (relevent)
library(sna)
library(ggplot2)
library(scales)
library(igraph)
library(RColorBrewer)

Welcome to part 1 of the practical for the social network analysis. In this practical we will get hands-on experience with the materials in the lectures about applying Relational Event Models (REMs) on real relational event history (REH) data: Class data and World Trade Center (WTC) by doing a bit of programming in R. These data sets are stored in UUsummerschool.rdata. Later in this practical we explain these data sets briefly.

1. Relational event model (REM)

As it was mentioned in the lecture, dyadic REMs are intended to capture the behavior of systems in which individual social units (persons, organizations, animals, companies, countries, etc.) direct discrete actions towards other individuals in their environment, which form a social network.

First, we need to load UUsummerschool.rdata and see what is inside of it.

# Set your own working directory
# setwd()

# Load data using load():
load("UUsummerschool.rdata") 
# This R-data contains 11 objects:
ls()
## [1] "as.sociomatrix.eventlist" "Class"                   
## [3] "ClassIntercept"           "ClassIsFemale"           
## [5] "ClassIsTeacher"           "PartOfApollo_13"         
## [7] "Twitter_data_rem3"        "WTCPoliceCalls"          
## [9] "WTCPoliceIsICR"

What is stored in UUsummerschool.rdata?
A function as.sociomatrix.eventlist that outputs a sociomatrix, and the rest are data sets such as PartOfApollo_13, Class, Twitter_data_rem3 and WTCPoliceCalls.

Probably you would ask: “What is a sociomatrix?” Before you let the computer do the social network analysis, it must have some basic data to calculate, like the connection between whom and whom. We should be aware that any one of those connections involves only two individuals. That is, it involves two nodes. This property might already inspire you to think the relationship as a mathematical expression, for instance a matrix of two dimensions. If we put the elements in the same sequence both at the row and the column, and we use 1 and 0 to represent whether there is a connection between the two elements or not, then we will have a matrix as we expect. The matrix is for the social network and we can give it an exclusive name, sociomatrix.

Secondly, let’s check the objects that we will work with in this practical.

as.sociomatrix.eventlist
## function (eventlist, n) 
## {
##     g <- matrix(0, n, n)
##     if (NROW(eventlist) > 0) {
##         tabmat <- table(eventlist[, -1, drop = FALSE])
##         g[as.numeric(dimnames(tabmat)[[1]]), as.numeric(dimnames(tabmat)[[2]])] <- tabmat
##     }
##     g
## }
## attr(,"source")
## [1] "function(eventlist,n){"                                                             
## [2] "  g<-matrix(0,n,n)"                                                                 
## [3] "  if(NROW(eventlist)>0){"                                                           
## [4] "    tabmat<-table(eventlist[,-1,drop=FALSE])"                                       
## [5] "    g[as.numeric(dimnames(tabmat)[[1]]), as.numeric(dimnames(tabmat)[[2]])]<-tabmat"
## [6] "  }"                                                                                
## [7] "  g"                                                                                
## [8] "}"
head(WTCPoliceCalls)
##   number source recipient
## 1      1     16        32
## 2      2     32        16
## 3      3     16        32
## 4      4     16        32
## 5      5     11        32
## 6      6     11        32
head(Class)
##   StartTime FromId ToId
## 1     0.135     14   12
## 2     0.270     12   14
## 3     0.405     18   12
## 4     0.540     12   18
## 5     0.675      1   12
## 6     0.810     12    1

2. The relevent package and rem.dyad function

The relevent package (Butts, 2015) in R provides implementation of relational event framework, with specialized functionality for easily fitting dyadic relational event models. Within the relevent package, the rem.dyad() function is the primary workhorse for modeling dyadic data. As you can see the related documentation, there are a bunch of arguments that can be used for different purposes. Here, in this practical, we only use a few of them (please see the pdf file rem.dyad.args). For more explanation please check the help file of rem.dyad as follows.

## You can access the help file in three different ways:
# 1)
??rem.dyad
# 2)
help(rem.dyad)
# 3)
?relevent::rem.dyad

There is also rem() for modeling the dyadic data. The main difference is that rem() needs the computed statistics in advance but rem.dyad() computes the statistics internally. However, if we have the value of statistics in advance rem() tends to be faster.

As applying the regression model, we should first decide which variables/statistics should be in our model. This can be done in several ways: experts knowledge, prior information and statistics/variables selection e.g., using BIC,etc. Note that not all effects may lead to identified models in all cases. It is up to the user to ensure that the postulated model makes sense.

3. WTC data & visualization, centrality measures, and community detection

This data contains information about the 9/11 terrorist attacks at the WTC in New York City in 2001. That should be notified that the radio communication was essential in coordinating activities during the crisis. Butts et al.(2007) coded radio communication events between officers responding to 9/11 from transcripts of communications recorded during the event. It is important to note that the WTC radio data was coded from transcripts that lacked detailed timing information; we do not therefore know precisely when these calls were made. But we know the order of events. Since the case of ordinal timing is somewhat simpler than that of exact timing, we consider the WTC data first in this practical. We will illustrate ordinal time REMs using the 481 communication events from 37 named communicants in this data set.

The eventlist is stored in an object called WTCPoliceCalls.
Question 1: Check the data using head and tailfunction. How many events does the data contain? How many actors? Without testing anything, can you see any trend? If so, which statistics can capture that trend?

Hint: Use table() to generate frequency tables for source and recipient attributes. Use ggplot(data, aes(attribute))+geom_bar()+labs(y="", x="") to visualize the data.

# Check the data.
head(WTCPoliceCalls)
##   number source recipient
## 1      1     16        32
## 2      2     32        16
## 3      3     16        32
## 4      4     16        32
## 5      5     11        32
## 6      6     11        32
table(WTCPoliceCalls$source)
## 
##   1   2   3   4   5   6   8   9  10  11  12  15  16  17  18  19  20  21  22  23 
##   5   1   5   4   2   1   5   1   2   7   3  18 109   1   9   5   2  17  22   5 
##  24  25  26  27  28  29  30  31  32  33  34  36  37 
##  10   9   3   6   4   6   8   6 171   3   5  22   4
table(WTCPoliceCalls$recipient)
## 
##   1   2   3   4   5   8  10  11  12  14  15  16  17  18  19  20  21  22  23  24 
##   4   1   4   5   2   1   1   5   2   1  19 103   2   9   4   2  13  21   5   8 
##  25  26  27  28  29  30  31  32  33  34  35  36  37 
##   6   5   1   2   4   7   5 201   2   5   2  26   3
ggplot(WTCPoliceCalls, aes(source))+geom_bar()+labs(y="Frequncy", x="Source")

ggplot(WTCPoliceCalls, aes(recipient))+geom_bar()+labs(y="Frequncy", x="Recipient")

You can see the reciprocation and recency pattern in the first few events: responding officer 16 called officer 32 in the first event, officer 32 then called 16 back in the second, which might be characterized as a local reciprocity effect or \(AB \rightarrow BA\) participation shift. This was followed by 32 being the target of the next four calls, which is probably due to either a role of 32 such as a coordinator or perhaps the presence of a recency mechanism.

Question 2: Use the included sna function as.sociomatrix.eventlist() to convert the eventlist into a valued sociomatrix. For the definition of sociomatrix see previous section.
Hint: as.sociomatrix.eventlist(data, number of actors).

WTCPoliceNet <- as.sociomatrix.eventlist(WTCPoliceCalls ,37)

Question 3: Once you have the sociomatrix object, plot the aggregated network using gplot().
Hint: gplot(created sociomatrix, the arguments such as the size of nodes, width of edges, colors of nodes, etc.)
gplot creates a node-and-edge visualization of a network. It can accept an adjacency matrix (N × N), an array of adjacency matrices (m × N × N), network objects, and other forms of network data (check sna library to see what else is possible).
gplot has many arguments that you can tweak your network model visualization, for example:

  • vertex.col: set the node color.
  • edge.col: color for edge border and fill.
  • vertex.sides: number of sides for node polygon. ex) triangle = 3, square = 4 (default = 8).
  • label.bg: color for the background of node.
    Check the help file of gplot to find other features.
gplot(WTCPoliceNet, edge.lwd=WTCPoliceNet^0.75, arrowhead.cex=
        log(as.edgelist.sna(WTCPoliceNet)[,3])+.25,
        vertex.col= ifelse(WTCPoliceIsICR ,"black","gray"),
        vertex.cex=1.25, vertex.sides=ifelse(WTCPoliceIsICR ,4,100))  
Police Communication Network

Police Communication Network

The plot above is the time-aggregated WTC Police communication network.

  • The three black square nodes represent actors who fill institutional coordinator roles and gray circle nodes represent all other communicants (one of the square nodes is in the center).
  • A directed edge is drawn between two actors, i and j, if actor i ever called actor j on the radio. The edges and arrowheads are scaled in proportion to the number of calls over time. This is clearly a hub-dominated network with two actors sitting on the majority of paths between all other actors.
  • The actor with the plurality of communication ties is an institutional coordinator (the square node at the center of the figure), heterogeneity in sending and receiving communication ties is evident, with several high-degree non-coordinators and two low-degree institutional coordinators, in the network. This source of heterogeneity is a good starting place from which to build our model.

Question 4: igraph is the most commonly used R package for network analysis. It provides tools to import, transform, and visualize graphs. The documentation for igraph can be found here: https://igraph.org/r/doc/. igraph functions operate on igraph objects, therefore we need to transform our sociomatrix (also called adjacency matrix) into an igraph object; Use graph_from_adjacency_matrix() to convert the WTC Police Calls sociomatrix into a graph object.
Hint: igraph object is a special class of graphs used by igraph. igraph graphs store information about vertices, edges, and attributes in a format of an edge list. You can read more about igraph package using ?igraph.

wtc_graph_fromAM <- graph_from_adjacency_matrix(WTCPoliceNet)

Question 5: Adding relevant data (attributes) to a graph is called ‘decorating’ a graph. In igraph it can be done by using $ operator. Assign corresponding numbers to nodes as vertex names. This will make it easier to index nodes while calculating different statistics.
Hint: You can set vertex attributes by using V(graph)$name_of_attribute.

V(wtc_graph_fromAM)$name <- seq(1,37)

Question 6: Use WTCPoliceIsICR vector to assign vertex attribute that shows whether an actor occupies an institutionalized coordinator role (ICR).
Hint: V(graph)$name_of_attribute.

V(wtc_graph_fromAM)$IsICR <- WTCPoliceIsICR

Question 7: Using igraph package visualize the WTC Police Calls network using different layouts to see which layout is the most beneficial. First, use a circular layout.
Hint: Use plot(graph, layout=layout_in_circle, vertex.color=...) to plot the graph. You can see different layout options using ?layout_().

# Circular layout
igraph_options(vertex.size=12, edge.arrow.size=0.5)
plot(wtc_graph_fromAM, layout=layout_in_circle, 
     vertex.color=ifelse(V(wtc_graph_fromAM)$IsICR, "red", "pink"))
Circular Layout of the Police Communication Network

Circular Layout of the Police Communication Network

Circular layout is the simplest layout. The vertices are arranged around the circumference of a circle (typically ordered by degree) and edges are then drawn across the circle.

Now, try a Fruchterman and Reingold method.
Hint: plot(graph, layout=layout_with_fr, vertex.color=...).

# Fruchterman and Reingold method
plot(wtc_graph_fromAM, layout=layout_with_fr, 
     vertex.color=ifelse(V(wtc_graph_fromAM)$IsICR, "red", "pink"))
Fruchterman and Reingold Layout of the Police Communication Network

Fruchterman and Reingold Layout of the Police Communication Network

Fruchterman and Reingold method is a spring-embedder method of graph drawing wherein the nodes are analogous to balls and edges to springs between them. The goal is to minimize the energy of the system through moving the nodes and changing the forces between them.

Also try the Kamada and Kawai method.
Hint: plot(graph, layout=layout_with_kk, vertex.color=...).

# Kamada and Kawai method
plot(wtc_graph_fromAM, layout=layout_with_kk, 
     vertex.color=ifelse(V(wtc_graph_fromAM)$IsICR, "red", "pink"))
Kamada and Kawai Layout of the Police Communication Network

Kamada and Kawai Layout of the Police Communication Network

Kamada and Kawai method is an energy-placement method which simulates a spring between each vertex. The length of the spring between the vertices is determined by the shortest path between them. The idea is to move the nodes in such a way as to minimize the global energy of the layout.

Question 8: What are the size and order of this network? Network’s size refers to the total number of it’s edges. And order refers to the number of vertices in the network.
Hint: gsize(graph), gorder(graph).

gsize(wtc_graph_fromAM)
## [1] 481
gorder(wtc_graph_fromAM)
## [1] 37

Question 9: Degree of a vertex is a number of edges incident upon that vertex. A degree distribution is a collection of degree frequencies over the whole network. The degree distribution shows how connected the graph is and whether some nodes have a notably distinct degree compared to others. Using ggplot2 create degree distributions of the WTC Police Calls data.
Hint: degree(graph, mode=c("in", "out", "total")); ggplot(data, aes()) + geom_histogram() + labs(title="", x="", y="").

# Since we are working with directed graph we can create indegree, outdegree, and total distributions:

# calculating degrees of the network 
G.indegrees <- igraph::degree(wtc_graph_fromAM, mode = 'in')
G.outdegrees <- igraph::degree(wtc_graph_fromAM, mode = 'out')
G.total_degree <- igraph::degree(wtc_graph_fromAM, mode = 'total')

# creating a dataframe to be able to work with ggplot
G.degree.histogram <- data.frame(G.indegrees, G.outdegrees, G.total_degree)

# ploting indegree distribution
ggplot(G.degree.histogram, aes(x = G.indegrees)) +
  geom_histogram(binwidth = 3,
                   colour= "black", fill = "white") +
  scale_y_continuous(breaks=breaks_pretty()) +
  labs(title="Indegree Distribution", y="Frequncy", x="Vertex Degree")

# ploting outdegree distribution
ggplot(G.degree.histogram, aes(x = G.outdegrees)) +
  geom_histogram(binwidth = 3,
                   colour= "black", fill = "white") +
  scale_y_continuous(breaks=breaks_pretty()) +
  labs(title="Outdegree Distribution", y="Frequncy", x="Vertex Degree")

# ploting total degree distribution
ggplot(G.degree.histogram, aes(x = G.total_degree)) +
  geom_histogram(binwidth = 3,
                   colour= "black", fill = "white") +
  scale_y_continuous(breaks=breaks_pretty()) +
  labs(title="Total Degree Distribution", y="Frequncy", x="Vertex Degree")

# you can also use hist() function to visualize the distributions.

#hist(degree(wtc_graph_fromAM, mode = 'in'), col='lightblue', xlab='Vertex Degree',
     #ylab='Frequency', main='')


#hist(degree(wtc_graph_fromAM, mode = 'out'), col='blue', xlab='Vertex Degree', 
     #ylab='Frequency', main='')

After examining the degree distribution plots, we can see that the majority of the nodes have quite low in-, out-, and total degrees. However, there are two highly connected nodes.

Question 10: Centrality measures help identify ‘important’ nodes in a network. Degree centrality measures how ‘important’ a node is through measuring the number of edges incident upon that node. Betweeness measures how often a certain node appears on a shortest path between pairs of other nodes. Closeness measures how close a certain vertex is to all other vertices. Eigenvector centrality captures the ‘importance’ of a node through it’s neighbors, therefore the more connected the vertex’s neighbors are the more ‘important’ is the vertex itself. Calculate centrality measures: degree, betweeness, and eigenvector (closeness can’t be calculated for this graph due to the presence of isolated vertices). Extract nodes that score the highest on each centrality measure.
Hint: degree(graph, ..., mode=c("all", "out", "in", "total")); betweenness(graph, directed = TRUE...,); eigen_centrality(graph, directed = TRUE, ...).

a) highest degree

in_degree <- igraph::degree(wtc_graph_fromAM, mode='in')
out_degree <- igraph::degree(wtc_graph_fromAM, mode='out')
total_degree <- igraph::degree(wtc_graph_fromAM, mode='total')

in_degree[in_degree == max(in_degree)]
##  32 
## 201
out_degree[out_degree == max(out_degree)]
##  32 
## 171
total_degree[total_degree == max(total_degree)]
##  32 
## 372

We can see that in this network, node 32 is the most central node with regard to its in-, out-, and total degree.

b) highest betweenness

between_wtc <- betweenness(wtc_graph_fromAM, directed = TRUE, normalized = TRUE)
between_wtc[between_wtc == max(between_wtc)]
##        32 
## 0.5936907

Node 32 also appears on the shortest path between other nodes the most.

c) highest eigenvector

eigen_wtc <- eigen_centrality(wtc_graph_fromAM, directed = TRUE)
eigen_values <- eigen_wtc[1]
eigen_values <- unlist(eigen_values)
eigen_values[eigen_values == max(eigen_values)]
## vector.32 
##         1

Node 32 also has the most well-connected neighbors. These results show us that node 32 is the most ‘important’ node in this network with regards to degree, betweeness, and eigenvector centralities.

Question 11: Community detection identifies subsets (clusters) of vertices that demonstrate ‘cohesiveness’ with respect to their relationship patterns. Clusters correspond to sets of nodes with more edges within the cluster than the rest of the graph. One of the ways of approaching community detection in directed graphs, called Naive approach, is to convert the directed network into an undirected one and assume edge symmetry. Apply community detection algorithm on WTCPoliceCalls data.
Hint: as.undirected(graph, mode = c("collapse", "each", "mutual")); cluster_fast_greedy(graph). note that this function cluster_fast_greedy applies the fast and greedy algorithm for community detection. It is a hierarchical clustering algorithm used to find densely connected subgraphs (communities) in the graph.

# Converting digraph into undirected graph without multyedges:
undirected.wtc <- as.undirected(
  wtc_graph_fromAM,
  mode = c("collapse"))

# use fast and greedy algorithm for hierarchical clustering 
wuc <- cluster_fast_greedy(undirected.wtc)

Once you’ve applied the algorithm, you can look how many communities the algorithm identified using length().

length(wuc)
## [1] 9

Additionally, you can see the number of members of each community using sizes().

sizes(wuc)
## Community sizes
##  1  2  3  4  5  6  7  8  9 
##  9 14  3  3  2  2  2  1  1

igraph has a convenient way of plotting results of community detection. You can use plot(communities, graph) to visually represent the results.

# plot clusters
plot(wuc, undirected.wtc)
Communities in the Police Communication Network

Communities in the Police Communication Network

After plotting the results of community detection, we can see that there are two major communities centered around node 32 and node 16. In addition, all isolated nodes have been assigned to their own clusters.

Hierarchical clustering methods produce a hierarchy of nested partitions of the graph. We can represent this hierarchy in the form of a dendrogram. Dendrogram is a tree diagram that represents the similarity among groups of objects. The heights of the branches signify how similar the objects are to each other. igraph has a function dendPlot(communities, mode='phylo') specifically for this.

# plot a dendrogram
dendPlot(wuc, mode='phylo')
## Warning: `dendPlot()` was deprecated in igraph 2.0.0.
## ℹ Please use `plot_dendrogram()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
Dendrogram of the Police Communication Network

Dendrogram of the Police Communication Network

Different communities are color-coded in the dendrogram which makes it clear how the nodes are partitioned into different clusters. Looking at this dendrogram we can see how similar the nodes are to the other members of their communities. It appears that the nodes in the same cluster are quite similar in this network.

4. Fitting relational event models

4-1. A simple covariate model

The question is whether the propensity of individuals to send and receive calls depends on whether they occupy institutionalized coordinator roles (ICR) or not. WTCPoliceIsICR is a vector that shows the ICR.

One of the arguments in rem.dyad() function is ordinal = TRUE which, by default, treats the data as ordinal. Because WTCPoliceCalls data does not contain exact timing but contains the order of events we do not need to change anything. However, later in the practical we will consider data with exact time stamps which will require us to change the value of this argument.

Question 12: Use rem.dyad() by passing appropriate arguments.
Hint: rem.dyad(data..., n = number of actors, effects = c("CovInt"), covar = list(CovInt = WTCPoliceIsICR), hessian = TRUE). Note that CovInt is an effect of the exogenous covariate(s) on the propensity of actors to initiate and receive communication events. CovInt requires exogenous covariate(s) to be passed as covariate arguments. Please check the description of arguments (?rem.dyad).

wtcfit1 <- rem.dyad(WTCPoliceCalls,n=37,effects=c("CovInt"), 
                   covar=list(CovInt=WTCPoliceIsICR),hessian=TRUE)
## Prepping edgelist.
## Checking/prepping covariates.
## Computing preliminary statistics
## Fitting model
## Obtaining goodness-of-fit statistics

Once you have fit the model, you can summarize the model fit using summary().

Question 13: Check the model summary. Then see how much were not very well predicted by your model using mean(apply(your model$predicted.match,1,all)). predicted.match provides a two-column matrix of true and false corresponding to the sender and receiver columns in the main data frame. It shows whether the model could predict the sender and receiver truly. apply(your model$predicted.match,1,all) applies function all over every row of this matrix. The function all checks whether all elements of a logical vector are TRUE (whether sender and receiver were correctly predicted). mean() calculates the average value of the apply() function output. TRUE is treated as 1 and FALSE is treated as 0. Therefore, the result of this function falls between 0 and 1 - the higher the number, the better the predictions of our model.
Hint: summary(your model name).

summary(wtcfit1)
## Relational Event Model (Ordinal Likelihood)
## 
##          Estimate  Std.Err Z value  Pr(>|z|)    
## CovInt.1 2.104464 0.069817  30.142 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Null deviance: 6921.048 on 481 degrees of freedom
## Residual deviance: 6193.998 on 480 degrees of freedom
##  Chi-square: 727.0499 on 1 degrees of freedom, asymptotic p-value 0 
## AIC: 6195.998 AICC: 6196.007 BIC: 6200.174
mean(apply(wtcfit1$predicted.match,1,all))
## [1] 0

The output gives us the covariate effect, some uncertainty, and goodness-of-fit information. The format is like the output for a regression model summary, but note that the coefficients should be interpreted considering the relational event framework.

Next, we evaluate whether it is worth treating these effects separately with respect to ICR status. To do so, we enter the ICR covariate as a sender and receiver covariate, respectively.

Question 14: Fit another model adding the covariates and check the model summary as you did above. See how much were not very well predicted by your model using mean(apply(your model$predicted.match,1,all)) and compare it with previous model prediction.
Hint: It is the same as what we did for Q12. But here we should consider the effect for the sender and for the receiver separately. The structure is the same but we need to select CovSnd, and CovRec. CovRec is an actor-level covariate effect for receiving interactions and CovSnd is an actor-level covariate effect for initiating interactions with others. Both effects require exogenous covariate(s) to be passed as covariate arguments. Check the description of arguments to see what you need to specify for effects.

wtcfit2<-rem.dyad(WTCPoliceCalls,n=37,effects=c("CovSnd","CovRec"),
                  covar=list(CovSnd=WTCPoliceIsICR ,CovRec= WTCPoliceIsICR),hessian=TRUE)
## Prepping edgelist.
## Checking/prepping covariates.
## Computing preliminary statistics
## Fitting model
## Obtaining goodness-of-fit statistics
summary(wtcfit2) # check the model output
## Relational Event Model (Ordinal Likelihood)
## 
##          Estimate  Std.Err Z value  Pr(>|z|)    
## CovSnd.1 1.979172 0.095745  20.671 < 2.2e-16 ***
## CovRec.1 2.225715 0.092862  23.968 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Null deviance: 6921.048 on 481 degrees of freedom
## Residual deviance: 6190.175 on 479 degrees of freedom
##  Chi-square: 730.8731 on 2 degrees of freedom, asymptotic p-value 0 
## AIC: 6194.175 AICC: 6194.2 BIC: 6202.527
mean(apply(wtcfit2$predicted.match,1,all))-mean(apply(wtcfit1$predicted.match,1,all))
## [1] 0

The output of 0 means that our second model had 0 better predicted senders and receivers compared to the first model.

Question 15: Now, evaluate which model is preferred by BIC (lower is better) and explain what the BIC tells us.
Hint: You can extract BIC value from each model by using $ sign (e.g., yourmodel$BIC).

wtcfit1$BIC - wtcfit2$BIC
## [1] -2.352663

4-2. Adding endogenous variables

Add the reciprocity term: \(AB \rightarrow BA\). For the definition of reciprocity, see the slides. In particular, the AB-BA shift, which captures the tendency for B to call A, given that A has just called B, is likely at play in radio communication.

Question 16: Fit another model adding the reciprocity term and check the model summary. See how much were not very well predicted by your model using mean(apply(your model$predicted.match,1,all))and compare it with previous model prediction.

Hint: Add “PSAB-BA” as another statistic to the best model. Check the description of arguments to see what you need to specify for effects.

wtcfit3 <- rem.dyad(WTCPoliceCalls,n=37,effects=c("CovInt","PSAB-BA"),
                  covar=list(CovInt=WTCPoliceIsICR),hessian=TRUE)
## Prepping edgelist.
## Checking/prepping covariates.
## Computing preliminary statistics
## Fitting model
## Obtaining goodness-of-fit statistics
summary(wtcfit3)
## Relational Event Model (Ordinal Likelihood)
## 
##          Estimate Std.Err Z value  Pr(>|z|)    
## CovInt.1  1.60405 0.11500  13.949 < 2.2e-16 ***
## PSAB-BA   7.32695 0.10552  69.436 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Null deviance: 6921.048 on 481 degrees of freedom
## Residual deviance: 2619.115 on 479 degrees of freedom
##  Chi-square: 4301.933 on 2 degrees of freedom, asymptotic p-value 0 
## AIC: 2623.115 AICC: 2623.14 BIC: 2631.467
mean(apply(wtcfit3$predicted.match,1,all))-mean(apply(wtcfit2$predicted.match,1,all))
## [1] 0.6839917

Question 17: Again, evaluate which model is preferred by BIC.

# Compare the BIC of this model with the first model.
wtcfit1$BIC - wtcfit3$BIC
## [1] 3568.707

Note that the lower BIC the better the model. It appears that there is a very strong reciprocity effect. Doesn’t it?

We may expect that the current participants in a communication are likely to initiate the next call and that one’s most recent communications may be the most likely to be returned.
These processes can be captured with the participation shifts for dyadic turn receiving/continuing and recency effects, respectively.

Question 18: Fit another model adding the p-shift effects to the previous model and check the model summary. See how much were not very well predicted by your model using mean(apply(your model$predicted.match,1,all)) and compare it with previous model prediction.
Hint: Add “PSAB-BY” and “PSAB-AY” to the best model. PSAB-BY is a turn receiving effect - a participant receiving a communication event is the one initiating the next communication effect \(AB \rightarrow BY\). “PSAB-AY” is a turn continuing effect, when a participant that initiated a conversation event initiates the next conversation event \(AB \rightarrow AY\). For more information check the description of arguments to see what you need to specify for effects.

## Adding p-shift effects to model 3
wtcfit4<-rem.dyad(WTCPoliceCalls,n=37,effects=c("CovInt","PSAB-BA","PSAB-BY","PSAB-AY"),
                  covar=list(CovInt=WTCPoliceIsICR),hessian=TRUE)
## Prepping edgelist.
## Checking/prepping covariates.
## Computing preliminary statistics
## Fitting model
## Obtaining goodness-of-fit statistics
summary(wtcfit4)
## Relational Event Model (Ordinal Likelihood)
## 
##          Estimate Std.Err Z value  Pr(>|z|)    
## CovInt.1  1.54283 0.11818 13.0549 < 2.2e-16 ***
## PSAB-BA   7.49955 0.11418 65.6831 < 2.2e-16 ***
## PSAB-BY   1.25941 0.25131  5.0115 5.401e-07 ***
## PSAB-AY   0.87216 0.30611  2.8491  0.004384 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Null deviance: 6921.048 on 481 degrees of freedom
## Residual deviance: 2595.135 on 477 degrees of freedom
##  Chi-square: 4325.913 on 4 degrees of freedom, asymptotic p-value 0 
## AIC: 2603.135 AICC: 2603.219 BIC: 2619.839
mean(apply(wtcfit4$predicted.match,1,all))-mean(apply(wtcfit3$predicted.match,1,all))
## [1] 0

Question 19: Evaluate which model is preferred.

# Compare the BIC of this model with the previous model (third model).
wtcfit3$BIC - wtcfit4$BIC
## [1] 11.62806

Question 20: Lastly, fit a model adding the recency effects to the previous model and check the model summary. See how much were not very well predicted by your model using mean(apply(your model$predicted.match,1,all)) and compare it with previous model prediction.
Hint: RRecSnd is a recency effect - the recency of conversation event \(A \rightarrow B\) affects the future rate of conversation event \(B \rightarrow A\). RSndSnd is also a recency effect - the recency of conversation event \(A \rightarrow B\) affects the future rate of conversation event \(A \rightarrow B\). Are they the statistics that we need for this purpose?

## Adding recency effects to model 4
wtcfit5<-rem.dyad(WTCPoliceCalls,n=37,
                  effects=c("CovInt","PSAB-BA","PSAB-BY","PSAB-AY","RRecSnd","RSndSnd"),
                  covar=list(CovInt=WTCPoliceIsICR),hessian=TRUE)
## Prepping edgelist.
## Checking/prepping covariates.
## Computing preliminary statistics
## Fitting model
## Obtaining goodness-of-fit statistics
summary(wtcfit5)
## Relational Event Model (Ordinal Likelihood)
## 
##          Estimate Std.Err Z value  Pr(>|z|)    
## RRecSnd   2.38496 0.27447  8.6892 < 2.2e-16 ***
## RSndSnd   1.34622 0.22307  6.0349 1.590e-09 ***
## CovInt.1  1.07058 0.14244  7.5160 5.640e-14 ***
## PSAB-BA   4.88714 0.15293 31.9569 < 2.2e-16 ***
## PSAB-BY   1.67938 0.26116  6.4304 1.273e-10 ***
## PSAB-AY   1.39016 0.31057  4.4762 7.600e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Null deviance: 6921.048 on 481 degrees of freedom
## Residual deviance: 2307.413 on 475 degrees of freedom
##  Chi-square: 4613.635 on 6 degrees of freedom, asymptotic p-value 0 
## AIC: 2319.413 AICC: 2319.591 BIC: 2344.469
mean(apply(wtcfit5$predicted.match,1,all))-mean(apply(wtcfit4$predicted.match,1,all))
## [1] 0

Question 21: Evaluate which model is preferred.

# Compare the BIC of this model with the previous model (fourth model).
wtcfit4$BIC - wtcfit5$BIC
## [1] 275.3701

The results indicate that turn-receiving, turn-continuing, and recency effects are all at play in the relational event process.

Question 22: Can you think of adding more statistics?
Please add a few more and see how the BIC changes. To this end, check the definition of statistics precisely.

# Try it out yourself.

Question 23: As there are a lot of statistics that we can use, can you think of some ways for statistics selection?

# Try it out yourself.

5. Assessing model adequacy

Model adequacy is a crucial consideration: even though the model fit is sufficiently good, you need to think of whether this model is good enough for our purposes. For example, the ability of the relational event model to predict the next event in the sequence, given those that have come before.
Note that we do not discuss this further in this practical but interested students can check Butts 2008.

6. Classroom data & visualization, centrality measures, and community detection

The classroom dataset includes timed interactions between students and instructors within a high school classroom (McFarland, 2001; Bender-deMoll and McFarland, 2006). Sender and receiver events (n=691) were recorded between 20 actors (18 students and 2 teachers) along with the time of the events in increments of minutes. Note that the data employed here were modified slightly to increase the amount of time occurring between very closely recorded events, ensuring no simultaneity of events as assumed by the relational event framework. Two actor-level covariates are also at hand in the dataset used here: whether the actor was a teacher and whether the actor was female.

Let’s start by examining the raw temporal data and the time-aggregated network.
In this case, event time is given in increments of minutes from the onset of observation.

Question 24: Check the data again using head(). Can you see any trend?
Hint: Use table() to generate frequency tables for sender and receiver attributes. Use ggplot(data, aes(attribute))+geom_bar()+labs(y="", x="") to visualize the data.

## The edgelist is stored in an object called `Class`.
head(Class)
##   StartTime FromId ToId
## 1     0.135     14   12
## 2     0.270     12   14
## 3     0.405     18   12
## 4     0.540     12   18
## 5     0.675      1   12
## 6     0.810     12    1
table(Class$FromId)
## 
##   1   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  20 
##  29  27  63  25  23 164  19  18  32  19  39  20  81  20  26  32  29  25
table(Class$ToId)
## 
##  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 
## 38 10 37 55 37 32 32 29 28 47 29 49 30 46 30 36 42 40 10 34
ggplot(Class, aes(FromId))+geom_bar()+labs(y="Frequncy", x="Sender")

ggplot(Class, aes(ToId))+geom_bar()+labs(y="Frequncy", x="Receiver")

Question 25: Convert the Class eventlist into a valued sociomatrix using as.sociomatrix.eventlist() as you did above.  
Hint: as.sociomatrix.eventlist(data, number of actors).

ClassNet <- as.sociomatrix.eventlist(Class ,20)

Question 26: Plot the network using gplot().
Here we color the female nodes black, the male nodes gray, represent teachers as square-shaped nodes and students as triangle-shaped nodes. Edges between nodes are likewise scaled proportional to the number of communication events transpiring between actors. 

Hint: Please check the hint of Q3 and consult the help file of gplot.

gplot(ClassNet ,vertex.col=ifelse(ClassIsFemale ,"black","gray"),
     vertex.sides=3+ClassIsTeacher ,vertex.cex=2, edge.lwd= ClassNet ^0.75)
Time-Aggregated Classroom Communications

Time-Aggregated Classroom Communications

Question 27: Use graph_from_adjacency_matrix() to convert the Classroom sociomatrix into a graph object.
Hint: graph_from_adjacency_matrix(sociomatrix).

# transforming class sociomatrix into igraph object
class_graph <- graph_from_adjacency_matrix(ClassNet)

Question 28: Assign corresponding numbers as vertex names.
Hint: You can set vertex attributes by using V(graph)$name_of_attribute.

V(class_graph)$name <- seq(1,20)

Question 29: Use ClassIsFemale and ClassIsTeacher vectors to assign vertex attributes.
Hint: V(graph)$name_of_attribute.

V(class_graph)$IsFemale <- ClassIsFemale 
V(class_graph)$IsTeacher <- ClassIsTeacher

Question 30: Using igraph package visualize the Classroom data using different layouts to see which layout is the most beneficial. First, use a circular layout.
Hint: plot(graph, layout=layout_in_circle, vertex.color=...).

# Circular layout
igraph_options(vertex.size=12, edge.arrow.size=0.5)
plot(class_graph, layout=layout_in_circle,
     vertex.shape=ifelse(V(class_graph)$IsTeacher, "rectangle", "circle"),
     vertex.color=ifelse(V(class_graph)$IsFemale, "pink", "lightblue"))
Circular Layout of the Classroom Network

Circular Layout of the Classroom Network

Now, try a Fruchterman and Reingold method.
Hint: plot(graph, layout=layout_with_fr, vertex.color=...).

# Fruchterman and Reingold method
plot(class_graph, layout=layout_with_fr, 
     vertex.shape=ifelse(V(class_graph)$IsTeacher, "rectangle", "circle"),
     vertex.color=ifelse(V(class_graph)$IsFemale, "pink", "lightblue"))
Fruchterman and Reingold Layout of the Classroom Network

Fruchterman and Reingold Layout of the Classroom Network

Also try the Kamada and Kawai method.
Hint: plot(graph, layout=layout_with_kk, vertex.color=...).

# Kamada and Kawai method
plot(class_graph, layout=layout_with_kk, 
     vertex.shape=ifelse(V(class_graph)$IsTeacher, "rectangle", "circle"),
     vertex.color=ifelse(V(class_graph)$IsFemale, "pink", "lightblue"))
Kamada and Kawai Layout of the Classroom Network

Kamada and Kawai Layout of the Classroom Network

Question 31: What are the size and order of this network?
Hint: gsize(graph), gorder(graph).

gsize(class_graph)
## [1] 691
gorder(class_graph)
## [1] 20

Question 32: Create degree distributions of the Classroom data.
Hint: ggplot(data, aes()) + geom_histogram() + labs(title="", x="", y="").

# Since we are working with directed graph we can create indegree and outdegree distributions:

# calculate indegree and outdegree
G.indegrees <- igraph::degree(class_graph, mode = 'in')
G.outdegrees <- igraph::degree(class_graph, mode = 'out')
G.total_degrees <- igraph::degree(class_graph, mode = 'total')

# creating a dataframe for working with ggplot
G.degree.histogram <- data.frame(G.indegrees, G.outdegrees, G.total_degrees)

# ploting indegree distribution
ggplot(G.degree.histogram, aes(x = G.indegrees)) +
  geom_histogram(binwidth = 3,
                   colour= "black", fill = "white") +
  labs(title="Indegree Distribution", x="Vertex Degree", y="Frequency") 

# ploting outdegree distribution
ggplot(G.degree.histogram, aes(x = G.outdegrees)) +
  geom_histogram(binwidth = 3,
                   colour= "black", fill = "white") +
  labs(title="Outdegree Distribution", x="Vertex Degree", y="Frequency") 

# ploting total degree distribution
ggplot(G.degree.histogram, aes(x = G.total_degrees)) +
  geom_histogram(binwidth = 3,
                   colour= "black", fill = "white") +
  labs(title="Total Degree Distribution", x="Vertex Degree", y="Frequency")

#hist(degree(class_graph, mode = 'in'), col='lightblue', xlab='Vertex Degree',
     #ylab='Frequency', main='')

#hist(degree(class_graph, mode = 'out'), col='blue', xlab='Vertex Degree', 
     #ylab='Frequency', main='')

According to the degree distributions, Classroom network is a well-connected network with a few nodes having noticeably higher degree compared to the rest.

Question 33: Calculate centrality measures: degree, betweeness, closeness and eigenvector. Extract nodes that score the highest on each centrality measure.
Hint: degree(graph, ..., mode=c("all", "out", "in", "total")); betweenness(graph, directed = TRUE, ...); closeness(graph, ...); eigen_centrality(graph, directed = TRUE, ...).

a) highest degree

in_degree_class <- igraph::degree(class_graph, mode='in')
out_degree_class <- igraph::degree(class_graph, mode='out')
total_degree_class <- igraph::degree(class_graph, mode='total')

in_degree_class[in_degree_class == max(in_degree_class)]
##  4 
## 55
out_degree_class[out_degree_class == max(out_degree_class)]
##   7 
## 164
total_degree_class[total_degree_class == max(total_degree_class)]
##   7 
## 196

Node with highest number of incoming edges is node 4. And the node with the highest out-, and total degrees is node 7.

b) highest betweenness

between_class <- betweenness(class_graph, directed = TRUE, normalized = TRUE)
between_class[between_class == max(between_class)]
##         7 
## 0.3656151

The node that occurs most often on the shortest path between other pairs of nodes is node 7.

c) highest closeness

closeness_class <- closeness(class_graph, normalized = TRUE)
closeness_class[closeness_class == max(closeness_class, na.rm = TRUE) & !is.na(closeness_class)]
##  4  7 14 
##  1  1  1

Nodes that are located the clossest to other nodes are 4, 7, and 14.

d) highest eigenvector

eigen_class <- eigen_centrality(class_graph, directed = TRUE)
eigen_values <- eigen_class[1]
eigen_values <- unlist(eigen_values)
eigen_values[eigen_values == max(eigen_values)]
## vector.4 
##        1

The node with the most connected neighbours is node 4.

Question 34: Apply community detection algorithm on the Classroom data. First, use naive approach.
Hint: as.undirected(graph, mode = c("collapse", "each", "mutual"); cluster_fast_greedy(graph).

# Converting digraph into undirected graph without multyedges:
undirected.class <- as.undirected(
  class_graph,
  mode = c("collapse"))

# use fast and greedy algorithm for hierarchical clustering 
cuc <- cluster_fast_greedy(undirected.class)

Use length() to check how many communities the algorithm has identified.

length(cuc)
## [1] 5

Use sizes() to see how large those communities are.

sizes(cuc)
## Community sizes
## 1 2 3 4 5 
## 5 5 3 4 3

Plot the results of the community detection using plot(communities, graph).

# plot clusters
plot(cuc, undirected.class)
Communities in the Classroom Network

Communities in the Classroom Network

Interestingly, the community sizes do not vary a lot. In addition, we can see that there is quite a lot of edges between the communities as well as within them which is not surprising given how well connected this network is.

Create a dendrogram using dendPlot(communities, mode='phylo').

# plot a dendrogram
dendPlot(cuc, mode='phylo')
Dendrogram of the Classroom Network

Dendrogram of the Classroom Network

According to the dendrogram, even though node 19 is assigned to a cluster, it is quite different from the nodes in it’s cluster. Similarly, node 2 is quite different from the other members of its cluster. This is also seen from the graph representation above, those two nodes are less connected with the other members of their clusters.

7. Exact Time Event Histories

We are going to use REMs for event histories with exact timing information.

7-1. Modeling with covariates

Question 35: Fit a simple intercept model, containing only a vector of 1s as an actor-level sending effect. And check the model summary. See how much were not very well predicted by your model using mean(apply(your model$predicted.match,1,all))
Hint: This vector is saved as ClassIntercept, which we can pass to the respective covariate arguments in rem.dyad(). Also note that we do not want to discard timing information (ordinal=FALSE).

classfit1<-rem.dyad(Class,n=20,effects=c("CovSnd"),
                      covar= list(CovSnd=ClassIntercept), ordinal=FALSE,hessian=TRUE)
## Prepping edgelist.
## Checking/prepping covariates.
## Computing preliminary statistics
## Fitting model
## Obtaining goodness-of-fit statistics
summary(classfit1)
## Relational Event Model (Temporal Likelihood)
## 
##           Estimate   Std.Err Z value  Pr(>|z|)    
## CovSnd.1 -3.332287  0.038042 -87.596 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Null deviance: 5987.221 on 691 degrees of freedom
## Residual deviance: 5987.221 on 691 degrees of freedom
##  Chi-square: -7.730705e-11 on 0 degrees of freedom, asymptotic p-value 1 
## AIC: 5989.221 AICC: 5989.227 BIC: 5993.759
mean(apply(classfit1$predicted.match,1,all))
## [1] 0

The model does not fit any better than the null, because it is equivalent to the null model (as indicated by the absence of difference between the null and residual deviance).

Question 36: Now fit a more interesting covariate model that considers gender(male/female) and role(teacher/student) for senders and receivers. And evaluate whether there is any improvement over the intercept-only model by BIC. See how much were not very well predicted by your model using mean(apply(your model$predicted.match,1,all)) and compare it with previous model prediction.

classfit2<-rem.dyad(Class,n=20,effects=c("CovSnd","CovRec"),
                    covar=list(CovSnd=cbind(ClassIntercept ,ClassIsTeacher ,
                    ClassIsFemale), CovRec=cbind(ClassIsTeacher ,ClassIsFemale))
                    ,ordinal=FALSE ,hessian=TRUE)
## Prepping edgelist.
## Checking/prepping covariates.
## Computing preliminary statistics
## Fitting model
## Obtaining goodness-of-fit statistics
summary(classfit2)
## Relational Event Model (Temporal Likelihood)
## 
##           Estimate   Std.Err  Z value  Pr(>|z|)    
## CovSnd.1 -3.834055  0.078836 -48.6333 < 2.2e-16 ***
## CovSnd.2  1.672786  0.091672  18.2475 < 2.2e-16 ***
## CovSnd.3  0.123985  0.094926   1.3061  0.191510    
## CovRec.1  0.373994  0.126999   2.9449  0.003231 ** 
## CovRec.2  0.165380  0.080892   2.0445  0.040909 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Null deviance: 5987.221 on 691 degrees of freedom
## Residual deviance: 5652.318 on 687 degrees of freedom
##  Chi-square: 334.9033 on 4 degrees of freedom, asymptotic p-value 0 
## AIC: 5662.318 AICC: 5662.405 BIC: 5685.008
# Compare the BIC of this model with the previous model.
classfit1$BIC - classfit2$BIC
## [1] 308.7508
mean(apply(classfit2$predicted.match,1,all))-mean(apply(classfit1$predicted.match,1,all))
## [1] 0.01013025

A good improvement over the null model. But note that gender does not appear to be predictive of sending communication. A better model may be one without that specific term included.

Question 37: Fit another model excluding the non-significant term and again compare to the previous model by BIC and prediction.
Hint: Based on the above outputs, which statistics should be included/excluded?

classfit3<-rem.dyad(Class,n=20,effects=c("CovSnd","CovRec"), 
                    covar=list(CovSnd=cbind(ClassIntercept ,ClassIsTeacher),
                               CovRec=cbind(ClassIsTeacher,ClassIsFemale)),
                                          ordinal=FALSE,hessian=TRUE)
## Prepping edgelist.
## Checking/prepping covariates.
## Computing preliminary statistics
## Fitting model
## Obtaining goodness-of-fit statistics
summary(classfit3)
## Relational Event Model (Temporal Likelihood)
## 
##           Estimate   Std.Err  Z value  Pr(>|z|)    
## CovSnd.1 -3.775222  0.063622 -59.3380 < 2.2e-16 ***
## CovSnd.2  1.615759  0.079933  20.2139 < 2.2e-16 ***
## CovRec.1  0.371765  0.127019   2.9269  0.003424 ** 
## CovRec.2  0.161158  0.080815   1.9942  0.046135 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Null deviance: 5987.221 on 691 degrees of freedom
## Residual deviance: 5654.016 on 688 degrees of freedom
##  Chi-square: 333.2049 on 3 degrees of freedom, asymptotic p-value 0 
## AIC: 5662.016 AICC: 5662.074 BIC: 5680.169
classfit2$BIC - classfit3$BIC
## [1] 4.839715
mean(apply(classfit3$predicted.match,1,all))-mean(apply(classfit2$predicted.match,1,all))
## [1] 0

Question 38: Is there any improvement?

7-2. Modeling endogenous social dynamics

For the classroom conversations, we can for example think of recency effects, turn-taking, sequential address, and turn-usurping. We can enter these terms into the model using their appropriate effect names (please look at the list and check the definition of statistics).

We keep the covariates from the best covariate model (i.e., model 3 from the previous section).
Question 39: Fit a model adding just a recency effect to the previous model and again check the improvement by BIC and prediction.

#First, just recency effects + model 3:
classfit4<-rem.dyad(Class,n=20,effects=c("CovSnd","CovRec","RRecSnd","RSndSnd"),
                    covar=list(CovSnd=cbind(ClassIntercept ,ClassIsTeacher), 
                               CovRec=cbind(ClassIsTeacher ,ClassIsFemale )),
                                           ordinal=FALSE ,hessian=TRUE)
## Prepping edgelist.
## Checking/prepping covariates.
## Computing preliminary statistics
## Fitting model
## Obtaining goodness-of-fit statistics
# Compare the BIC of this model with the previous model (model3).
classfit3$BIC - classfit4$BIC
## [1] 1118.294
mean(apply(classfit4$predicted.match,1,all))-mean(apply(classfit3$predicted.match,1,all))
## [1] 0.02170767

Question 40: Fit another model adding the conversational norms to the previous model (model 4) and check the improvement by BIC and prediction.

## Next conversational norms + model 4
## "PSAB-BA","PSAB-AY","PSAB-BY"

classfit5<-rem.dyad(Class,n=20,effects=c("CovSnd","CovRec","RRecSnd","RSndSnd", "PSAB-BA","PSAB-AY","PSAB-BY"),
covar= list(CovSnd=cbind(ClassIntercept,ClassIsTeacher), 
CovRec= cbind(ClassIsTeacher,ClassIsFemale)),ordinal=FALSE,hessian= TRUE)
## Prepping edgelist.
## Checking/prepping covariates.
## Computing preliminary statistics
## Fitting model
## Obtaining goodness-of-fit statistics
summary(classfit5)
## Relational Event Model (Temporal Likelihood)
## 
##           Estimate   Std.Err  Z value  Pr(>|z|)    
## RRecSnd   2.429210  0.155366  15.6354 < 2.2e-16 ***
## RSndSnd  -0.986733  0.144667  -6.8207 9.059e-12 ***
## CovSnd.1 -5.003427  0.090609 -55.2201 < 2.2e-16 ***
## CovSnd.2  1.253891  0.085160  14.7239 < 2.2e-16 ***
## CovRec.1 -0.722686  0.141950  -5.0911 3.559e-07 ***
## CovRec.2  0.047933  0.081325   0.5894    0.5556    
## PSAB-BA   4.622141  0.137601  33.5910 < 2.2e-16 ***
## PSAB-BY   1.677580  0.164931  10.1714 < 2.2e-16 ***
## PSAB-AY   2.869967  0.103113  27.8332 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Null deviance: 5987.221 on 691 degrees of freedom
## Residual deviance: 2803.315 on 683 degrees of freedom
##  Chi-square: 3183.906 on 8 degrees of freedom, asymptotic p-value 0 
## AIC: 2821.315 AICC: 2821.58 BIC: 2862.158
# Compare the BIC of this model with the previous model (model4).
classfit4$BIC - classfit5$BIC
## [1] 1699.716
mean(apply(classfit5$predicted.match,1,all))-mean(apply(classfit4$predicted.match,1,all))
## [1] 0.2995658

This is the end of Part 1 of REM practical. In Part 2 we will consider two other data sets Apollo 13 and Twitter data. You will fit simple REM models to these data and choose which statistics to include.