Practical - REM (Part 1)

Welcome to part 1 of the practical for the social network analysis. In this practical we will get hands-on experience with the materials in the lectures about applying Relational Event Models (REMs) on real relational event history (REH) data: Class data and World Trade Center (WTC) by doing a bit of programming in R. These data sets are stored in UUsummerschool.rdata. Later in this practical we explain these data sets briefly.

0. Preparation

R can be obtained via the https://www.r-project.org/ webpage. To download R, you need to select your preferred CRAN (Comprehensive R Archive Network) mirror (https://cran.r-project.org/mirrors.html). On the Mirrors webpage, you will find listings of countries that have identical versions of R and should select a location geographically close to your computer’s location. R can be downloaded for Linux, Windows, and Mac OS. The pages are regularly updated and you need to check with releases are supported for your platform. R as a base package can perform many statistical analyses but most importantly, R’s functionality can be expanded by downloading specific packages. After installing R (https://www.r-project.org/), it is quite useful to also install R Studio (https://www.rstudio.com/), which provides a convenient interface to R. Once both are installed, opening up R Studio will give a window that is split into 4 panes:

Console/Terminal: this pane is the main graphical interface for the user and this is where the commands are typed in.
Editor: this pane shows the active datasets that you are working on.
Environment/History/Connections: this pane shows the R datasets and allows you to import data from text (e.g. csv. file), Excel, SPSS, SAS and Stata. The History tab allows you see the list of your previous commands.
Files/plots/packages/help: this pane and its tabs can open files, view the most current plot (also previous plots), install and load packages, or use the general R help function. Under the Tools drop down tap at the top of the R Studio screen, you can select which packages to install for the analyses required. Alternatively the packages can be installed using the Packages tab or they can be directly installed using a typed command (install.packages()). R is a command line driven programme and you can enter commands at the prompt (> by default) and each command is executed one at a time.

For the current practical, you will need to install a few packages as follows:

## Install the following package
# install.packages ("relevent")
library (relevent)
library(sna)
library(ggplot2)
library(scales)
library(igraph)
library(RColorBrewer)

1. Relational event model (REM)

As it was mentioned in the lecture, dyadic REMs are intended to capture the behavior of systems in which individual social units (persons, organizations, animals, companies, countries, etc.) direct discrete actions towards other individuals in their environment, which form a social network.

First, we need to load UUsummerschool.rdata and see what is inside of it.

# Set your own working directory
# setwd()

# Load data using load():
load("UUsummerschool.rdata") 
# This R-data contains 11 objects:
ls()

## [1] "as.sociomatrix.eventlist" "Class"                   
## [3] "ClassIntercept"           "ClassIsFemale"           
## [5] "ClassIsTeacher"           "PartOfApollo_13"         
## [7] "Twitter_data_rem3"        "WTCPoliceCalls"          
## [9] "WTCPoliceIsICR"

What is stored in UUsummerschool.rdata?
A function as.sociomatrix.eventlist that outputs a sociomatrix, and the rest are data sets such as PartOfApollo_13, Class, Twitter_data_rem3 and WTCPoliceCalls.

Probably you would ask: “What is a sociomatrix?” Before you let the computer do the social network analysis, it must have some basic data to calculate, like the connection between whom and whom. We should be aware that any one of those connections involves only two individuals. That is, it involves two nodes. This property might already inspire you to think the relationship as a mathematical expression, for instance a matrix of two dimensions. If we put the elements in the same sequence both at the row and the column, and we use 1 and 0 to represent whether there is a connection between the two elements or not, then we will have a matrix as we expect. The matrix is for the social network and we can give it an exclusive name, sociomatrix.

Secondly, let’s check the objects that we will work with in this practical.

as.sociomatrix.eventlist

## function (eventlist, n) 
## {
##     g <- matrix(0, n, n)
##     if (NROW(eventlist) > 0) {
##         tabmat <- table(eventlist[, -1, drop = FALSE])
##         g[as.numeric(dimnames(tabmat)[[1]]), as.numeric(dimnames(tabmat)[[2]])] <- tabmat
##     }
##     g
## }
## attr(,"source")
## [1] "function(eventlist,n){"                                                             
## [2] "  g<-matrix(0,n,n)"                                                                 
## [3] "  if(NROW(eventlist)>0){"                                                           
## [4] "    tabmat<-table(eventlist[,-1,drop=FALSE])"                                       
## [5] "    g[as.numeric(dimnames(tabmat)[[1]]), as.numeric(dimnames(tabmat)[[2]])]<-tabmat"
## [6] "  }"                                                                                
## [7] "  g"                                                                                
## [8] "}"

head(WTCPoliceCalls)

##   number source recipient
## 1      1     16        32
## 2      2     32        16
## 3      3     16        32
## 4      4     16        32
## 5      5     11        32
## 6      6     11        32

head(Class)

##   StartTime FromId ToId
## 1     0.135     14   12
## 2     0.270     12   14
## 3     0.405     18   12
## 4     0.540     12   18
## 5     0.675      1   12
## 6     0.810     12    1

2. The `relevent` package and `rem.dyad` function

The relevent package (Butts, 2015) in R provides implementation of relational event framework, with specialized functionality for easily fitting dyadic relational event models. Within the relevent package, the rem.dyad() function is the primary workhorse for modeling dyadic data. As you can see the related documentation, there are a bunch of arguments that can be used for different purposes. Here, in this practical, we only use a few of them (please see the pdf file rem.dyad.args). For more explanation please check the help file of rem.dyad as follows.

## You can access the help file in three different ways:
# 1)
??rem.dyad
# 2)
help(rem.dyad)
# 3)
?relevent::rem.dyad

There is also rem() for modeling the dyadic data. The main difference is that rem() needs the computed statistics in advance but rem.dyad() computes the statistics internally. However, if we have the value of statistics in advance rem() tends to be faster.

As applying the regression model, we should first decide which variables/statistics should be in our model. This can be done in several ways: experts knowledge, prior information and statistics/variables selection e.g., using BIC,etc. Note that not all effects may lead to identified models in all cases. It is up to the user to ensure that the postulated model makes sense.

3. WTC data & visualization, centrality measures, and community detection

This data contains information about the 9/11 terrorist attacks at the WTC in New York City in 2001. That should be notified that the radio communication was essential in coordinating activities during the crisis. Butts et al.(2007) coded radio communication events between officers responding to 9/11 from transcripts of communications recorded during the event. It is important to note that the WTC radio data was coded from transcripts that lacked detailed timing information; we do not therefore know precisely when these calls were made. But we know the order of events. Since the case of ordinal timing is somewhat simpler than that of exact timing, we consider the WTC data first in this practical. We will illustrate ordinal time REMs using the 481 communication events from 37 named communicants in this data set.

The eventlist is stored in an object called WTCPoliceCalls.
Question 1: Check the data using head and tailfunction. How many events does the data contain? How many actors? Without testing anything, can you see any trend? If so, which statistics can capture that trend?

Hint: Use table() to generate frequency tables for source and recipient attributes. Use ggplot(data, aes(attribute))+geom_bar()+labs(y="", x="") to visualize the data.

# Check the data.

You can see the reciprocation and recency pattern in the first few events: responding officer 16 called officer 32 in the first event, officer 32 then called 16 back in the second, which might be characterized as a local reciprocity effect or $AB \rightarrow BA$ participation shift. This was followed by 32 being the target of the next four calls, which is probably due to either a role of 32 such as a coordinator or perhaps the presence of a recency mechanism.

Question 2: Use the included sna function as.sociomatrix.eventlist() to convert the eventlist into a valued sociomatrix. For the definition of sociomatrix see previous section.
Hint: as.sociomatrix.eventlist(data, number of actors).

# Write your code here.

Question 3: Once you have the sociomatrix object, plot the aggregated network using gplot().
Hint: gplot(created sociomatrix, the arguments such as the size of nodes, width of edges, colors of nodes, etc.)
gplot creates a node-and-edge visualization of a network. It can accept an adjacency matrix (N × N), an array of adjacency matrices (m × N × N), network objects, and other forms of network data (check sna library to see what else is possible).
gplot has many arguments that you can tweak your network model visualization, for example:

vertex.col: set the node color.
edge.col: color for edge border and fill.
vertex.sides: number of sides for node polygon. ex) triangle = 3, square = 4 (default = 8).
label.bg: color for the background of node.
Check the help file of gplot to find other features.

# Write your code here.

The plot above is the time-aggregated WTC Police communication network.

The three black square nodes represent actors who fill institutional coordinator roles and gray circle nodes represent all other communicants (one of the square nodes is in the center).
A directed edge is drawn between two actors, i and j, if actor i ever called actor j on the radio. The edges and arrowheads are scaled in proportion to the number of calls over time. This is clearly a hub-dominated network with two actors sitting on the majority of paths between all other actors.
The actor with the plurality of communication ties is an institutional coordinator (the square node at the center of the figure), heterogeneity in sending and receiving communication ties is evident, with several high-degree non-coordinators and two low-degree institutional coordinators, in the network. This source of heterogeneity is a good starting place from which to build our model.

Question 4: igraph is the most commonly used R package for network analysis. It provides tools to import, transform, and visualize graphs. The documentation for igraph can be found here: https://igraph.org/r/doc/. igraph functions operate on igraph objects, therefore we need to transform our sociomatrix (also called adjacency matrix) into an igraph object; Use graph_from_adjacency_matrix() to convert the WTC Police Calls sociomatrix into a graph object.
Hint: igraph object is a special class of graphs used by igraph. igraph graphs store information about vertices, edges, and attributes in a format of an edge list. You can read more about igraph package using ?igraph.

# Write your code here.

Question 5: Adding relevant data (attributes) to a graph is called ‘decorating’ a graph. In igraph it can be done by using $ operator. Assign corresponding numbers to nodes as vertex names. This will make it easier to index nodes while calculating different statistics.
Hint: You can set vertex attributes by using V(graph)$name_of_attribute.

# Write your code here.

Question 6: Use WTCPoliceIsICR vector to assign vertex attribute that shows whether an actor occupies an institutionalized coordinator role (ICR).
Hint: V(graph)$name_of_attribute.

# Write your code here.

Question 7: Using igraph package visualize the WTC Police Calls network using different layouts to see which layout is the most beneficial. First, use a circular layout.
Hint: Use plot(graph, layout=layout_in_circle, vertex.color=...) to plot the graph. You can see different layout options using ?layout_().

# Write your code here.

Circular layout is the simplest layout. The vertices are arranged around the circumference of a circle (typically ordered by degree) and edges are then drawn across the circle.

Now, try a Fruchterman and Reingold method.
Hint: plot(graph, layout=layout_with_fr, vertex.color=...).

# Write your code here.

Fruchterman and Reingold method is a spring-embedder method of graph drawing wherein the nodes are analogous to balls and edges to springs between them. The goal is to minimize the energy of the system through moving the nodes and changing the forces between them.

Also try the Kamada and Kawai method.
Hint: plot(graph, layout=layout_with_kk, vertex.color=...).

# Write your code here.

Kamada and Kawai method is an energy-placement method which simulates a spring between each vertex. The length of the spring between the vertices is determined by the shortest path between them. The idea is to move the nodes in such a way as to minimize the global energy of the layout.

Question 8: What are the size and order of this network? Network’s size refers to the total number of it’s edges. And order refers to the number of vertices in the network.
Hint: gsize(graph), gorder(graph).

# Write your code here.

Question 9: Degree of a vertex is a number of edges incident upon that vertex. A degree distribution is a collection of degree frequencies over the whole network. The degree distribution shows how connected the graph is and whether some nodes have a notably distinct degree compared to others. Using ggplot2 create degree distributions of the WTC Police Calls data.
Hint: degree(graph, mode=c("in", "out", "total")); ggplot(data, aes()) + geom_histogram() + labs(title="", x="", y="").

# Write your code here.

After examining the degree distribution plots, we can see that the majority of the nodes have quite low in-, out-, and total degrees. However, there are two highly connected nodes.

Question 10: Centrality measures help identify ‘important’ nodes in a network. Degree centrality measures how ‘important’ a node is through measuring the number of edges incident upon that node. Betweeness measures how often a certain node appears on a shortest path between pairs of other nodes. Closeness measures how close a certain vertex is to all other vertices. Eigenvector centrality captures the ‘importance’ of a node through it’s neighbors, therefore the more connected the vertex’s neighbors are the more ‘important’ is the vertex itself. Calculate centrality measures: degree, betweeness, and eigenvector (closeness can’t be calculated for this graph due to the presence of isolated vertices). Extract nodes that score the highest on each centrality measure.
Hint: degree(graph, ..., mode=c("all", "out", "in", "total")); betweenness(graph, directed = TRUE...,); eigen_centrality(graph, directed = TRUE, ...).

a) highest degree

# Write your code here.

We can see that in this network, node 32 is the most central node with regard to its in-, out-, and total degree.

b) highest betweenness

# Write your code here.

Node 32 also appears on the shortest path between other nodes the most.

c) highest eigenvector

# Write your code here.

Node 32 also has the most well-connected neighbors. These results show us that node 32 is the most ‘important’ node in this network with regards to degree, betweeness, and eigenvector centralities.

Question 11: Community detection identifies subsets (clusters) of vertices that demonstrate ‘cohesiveness’ with respect to their relationship patterns. Clusters correspond to sets of nodes with more edges within the cluster than the rest of the graph. One of the ways of approaching community detection in directed graphs, called Naive approach, is to convert the directed network into an undirected one and assume edge symmetry. Apply community detection algorithm on WTCPoliceCalls data.
Hint: as.undirected(graph, mode = c("collapse", "each", "mutual")); cluster_fast_greedy(graph).

# Write your code here.

Once you’ve applied the algorithm, you can look how many communities the algorithm identified using length().

# Write your code here.

Additionally, you can see the number of members of each community using sizes().

# Write your code here.

igraph has a convenient way of plotting results of community detection. You can use plot(communities, graph) to visually represent the results.

# Write your code here.

After plotting the results of community detection, we can see that there are two major communities centered around node 32 and node 16. In addition, all isolated nodes have been assigned to their own clusters.

Hierarchical clustering methods produce a hierarchy of nested partitions of the graph. We can represent this hierarchy in the form of a dendrogram. Dendrogram is a tree diagram that represents the similarity among groups of objects. The heights of the branches signify how similar the objects are to each other. igraph has a function dendPlot(communities, mode='phylo') specifically for this.

# Write your code here.

Different communities are color-coded in the dendrogram which makes it clear how the nodes are partitioned into different clusters. Looking at this dendrogram we can see how similar the nodes are to the other members of their communities. It appears that the nodes in the same cluster are quite similar in this network.

4. Fitting relational event models

4-1. A simple covariate model

The question is whether the propensity of individuals to send and receive calls depends on whether they occupy institutionalized coordinator roles (ICR) or not. WTCPoliceIsICR is a vector that shows the ICR.

One of the arguments in rem.dyad() function is ordinal = TRUE which, by default, treats the data as ordinal. Because WTCPoliceCalls data does not contain exact timing but contains the order of events we do not need to change anything. However, later in the practical we will consider data with exact time stamps which will require us to change the value of this argument.

Question 12: Use rem.dyad() by passing appropriate arguments.
Hint: rem.dyad(data..., n = number of actors, effects = c("CovInt"), covar = list(CovInt = WTCPoliceIsICR), hessian = ...). Note that CovInt is an effect of the exogenous covariate(s) on the propensity of actors to initiate and receive communication events. CovInt requires exogenous covariate(s) to be passed as covariate arguments. Please check the description of arguments (?rem.dyad).

# Write your code here.

Once you have fit the model, you can summarize the model fit using summary().

Question 13: Check the model summary. Then see how much were not very well predicted by your model using mean(apply(your model$predicted.match,1,all)). predicted.match provides a two-column matrix of true and false corresponding to the sender and receiver columns in the main data frame. It shows whether the model could predict the sender and receiver truly. apply(your model$predicted.match,1,all) applies function all over every row of this matrix. The function all checks whether all elements of a logical vector are TRUE (whether sender and receiver were correctly predicted). mean() calculates the average value of the apply() function output. TRUE is treated as 1 and FALSE is treated as 0. Therefore, the result of this function falls between 0 and 1 - the higher the number, the better the predictions of our model.
Hint: summary(your model name).

# Write your code here.

The output gives us the covariate effect, some uncertainty, and goodness-of-fit information. The format is like the output for a regression model summary, but note that the coefficients should be interpreted considering the relational event framework.

Next, we evaluate whether it is worth treating these effects separately with respect to ICR status. To do so, we enter the ICR covariate as a sender and receiver covariate, respectively.

Question 14: Fit another model adding the covariates and check the model summary as you did above. See how much were not very well predicted by your model using mean(apply(your model$predicted.match,1,all)) and compare it with previous model prediction.
Hint: It is the same as what we did for Q12. But here we should consider the effect for the sender and for the receiver separately. The structure is the same but we need to select CovSnd, and CovRec. CovRec is an actor-level covariate effect for receiving interactions and CovSnd is an actor-level covariate effect for initiating interactions with others. Both effects require exogenous covariate(s) to be passed as covariate arguments. Check the description of arguments to see what you need to specify for effects.

# Write your code here.

The output of 0 means that our second model had 0 better predicted senders and receivers compared to the first model.

Question 15: Now, evaluate which model is preferred by BIC (lower is better) and explain what the BIC tells us.
Hint: You can extract BIC value from each model by using $ sign (e.g., yourmodel$BIC).

# Write your code here.

4-2. Adding endogenous variables

Add the reciprocity term: $AB \rightarrow BA$. For the definition of reciprocity, see the slides. In particular, the AB-BA shift, which captures the tendency for B to call A, given that A has just called B, is likely at play in radio communication.

Question 16: Fit another model adding the reciprocity term and check the model summary. See how much were not very well predicted by your model using mean(apply(your model$predicted.match,1,all))and compare it with previous model prediction.

Hint: Add “PSAB-BA” as another statistic to the best model. Check the description of arguments to see what you need to specify for effects.

# Write your code here.

Question 17: Again, evaluate which model is preferred by BIC.

# Compare the BIC of this model with the first model.

Note that the lower BIC the better the model. It appears that there is a very strong reciprocity effect. Doesn’t it?

We may expect that the current participants in a communication are likely to initiate the next call and that one’s most recent communications may be the most likely to be returned.
These processes can be captured with the participation shifts for dyadic turn receiving/continuing and recency effects, respectively.

Question 18: Fit another model adding the p-shift effects to the previous model and check the model summary. See how much were not very well predicted by your model using mean(apply(your model$predicted.match,1,all)) and compare it with previous model prediction.
Hint: Add “PSAB-BY” and “PSAB-AY” to the best model. PSAB-BY is a turn receiving effect - a participant receiving a communication event is the one initiating the next communication effect $AB \rightarrow BY$. “PSAB-AY” is a turn continuing effect, when a participant that initiated a conversation event initiates the next conversation event $AB \rightarrow AY$. For more information check the description of arguments to see what you need to specify for effects.

# Write your code here.

Question 19: Evaluate which model is preferred.

# Compare the BIC of this model with the previous model (third model).

Question 20: Lastly, fit a model adding the recency effects to the previous model and check the model summary. See how much were not very well predicted by your model using mean(apply(your model$predicted.match,1,all)) and compare it with previous model prediction.
Hint: RRecSnd is a recency effect - the recency of conversation event $A \rightarrow B$ affects the future rate of conversation event $B \rightarrow A$. RSndSnd is also a recency effect - the recency of conversation event $A \rightarrow B$ affects the future rate of conversation event $A \rightarrow B$. Are they the statistics that we need for this purpose?

# Write your code here.

Question 21: Evaluate which model is preferred.

# Compare the BIC of this model with the previous model (fourth model).

The results indicate that turn-receiving, turn-continuing, and recency effects are all at play in the relational event process.

Question 22: Can you think of adding more statistics?
Please add a few more and see how the BIC changes. To this end, check the definition of statistics precisely.

# Try it out yourself.

Question 23: As there are a lot of statistics that we can use, can you think of some ways for statistics selection?

# Try it out yourself.

5. Assessing model adequacy

Model adequacy is a crucial consideration: even though the model fit is sufficiently good, you need to think of whether this model is good enough for our purposes. For example, the ability of the relational event model to predict the next event in the sequence, given those that have come before.
Note that we do not discuss this further in this practical but interested students can check Butts 2008.

6. Classroom data & visualization, centrality measures, and community detection

The classroom dataset includes timed interactions between students and instructors within a high school classroom (McFarland, 2001; Bender-deMoll and McFarland, 2006). Sender and receiver events (n=691) were recorded between 20 actors (18 students and 2 teachers) along with the time of the events in increments of minutes. Note that the data employed here were modified slightly to increase the amount of time occurring between very closely recorded events, ensuring no simultaneity of events as assumed by the relational event framework. Two actor-level covariates are also at hand in the dataset used here: whether the actor was a teacher and whether the actor was female.

Let’s start by examining the raw temporal data and the time-aggregated network.
In this case, event time is given in increments of minutes from the onset of observation.

Question 24: Check the data again using head(). Can you see any trend?
Hint: Use table() to generate frequency tables for sender and receiver attributes. Use ggplot(data, aes(attribute))+geom_bar()+labs(y="", x="") to visualize the data.

# Check the data.

Question 25: Convert the Class eventlist into a valued sociomatrix using as.sociomatrix.eventlist() as you did above.
Hint: as.sociomatrix.eventlist(data, number of actors).

# Write your code here.

Question 26: Plot the network using gplot().
Here we color the female nodes black, the male nodes gray, represent teachers as square-shaped nodes and students as triangle-shaped nodes. Edges between nodes are likewise scaled proportional to the number of communication events transpiring between actors.

Hint: Please check the hint of Q3 and consult the help file of gplot.

# Write your code here.

Question 27: Use graph_from_adjacency_matrix() to convert the Classroom sociomatrix into a graph object.
Hint: graph_from_adjacency_matrix(sociomatrix).

# Write your code here.

Question 28: Assign corresponding numbers as vertex names.
Hint: You can set vertex attributes by using V(graph)$name_of_attribute.

# Write your code here.

Question 29: Use ClassIsFemale and ClassIsTeacher vectors to assign vertex attributes.
Hint: V(graph)$name_of_attribute.

# Write your code here.

Question 30: Using igraph package visualize the Classroom data using different layouts to see which layout is the most beneficial. First, use a circular layout.
Hint: plot(graph, layout=layout_in_circle, vertex.color=...).

# Write your code here.

Now, try a Fruchterman and Reingold method.
Hint: plot(graph, layout=layout_with_fr, vertex.color=...).

# Write your code here.

Also try the Kamada and Kawai method.
Hint: plot(graph, layout=layout_with_kk, vertex.color=...).

# Write your code here.

Question 31: What are the size and order of this network?
Hint: gsize(graph), gorder(graph).

# Write your code here.

Question 32: Create degree distributions of the Classroom data.
Hint: ggplot(data, aes()) + geom_histogram() + labs(title="", x="", y="").

# Write your code here.

According to the degree distributions, Classroom network is a well-connected network with a few nodes having noticeably higher degree compared to the rest.

Question 33: Calculate centrality measures: degree, betweeness, closeness and eigenvector. Extract nodes that score the highest on each centrality measure.
Hint: degree(graph, ..., mode=c("all", "out", "in", "total")); betweenness(graph, directed = TRUE, ...); closeness(graph, ...); eigen_centrality(graph, directed = TRUE, ...).

a) highest degree

# Write your code here.

Node with highest number of incoming edges is node 4. And the node with the highest out-, and total degrees is node 7.

b) highest betweenness

# Write your code here.

The node that occurs most often on the shortest path between other pairs of nodes is node 7.

c) highest closeness

# Write your code here.

Nodes that are located the clossest to other nodes are 4, 7, and 14.

d) highest eigenvector

# Write your code here.

The node with the most connected neighbours is node 4.

Question 34: Apply community detection algorithm on the Classroom data. First, use naive approach.
Hint: as.undirected(graph, mode = c("collapse", "each", "mutual"); cluster_fast_greedy(graph).

# Write your code here.

Use length() to check how many communities the algorithm has identified.

# Write your code here.

Use sizes() to see how large those communities are.

# Write your code here.

Plot the results of the community detection using plot(communities, graph).

# Write your code here.

Interestingly, the community sizes do not vary a lot. In addition, we can see that there is quite a lot of edges between the communities as well as within them which is not surprising given how well connected this network is.

Create a dendrogram using dendPlot(communities, mode='phylo').

# Write your code here.

According to the dendrogram, even though node 19 is assigned to a cluster, it is quite different from the nodes in it’s cluster. Similarly, node 2 is quite different from the other members of its cluster. This is also seen from the graph representation above, those two nodes are less connected with the other members of their clusters.

7. Exact Time Event Histories

We are going to use REMs for event histories with exact timing information.

7-1. Modeling with covariates

Question 35: Fit a simple intercept model, containing only a vector of 1s as an actor-level sending effect. And check the model summary. See how much were not very well predicted by your model using mean(apply(your model$predicted.match,1,all))
Hint: This vector is saved as ClassIntercept, which we can pass to the respective covariate arguments in rem.dyad(). Also note that we do not want to discard timing information (ordinal=FALSE).

# Write your code here.

The model does not fit any better than the null, because it is equivalent to the null model (as indicated by the absence of difference between the null and residual deviance).

Question 36: Now fit a more interesting covariate model that considers gender(male/female) and role(teacher/student) for senders and receivers. And evaluate whether there is any improvement over the intercept-only model by BIC. See how much were not very well predicted by your model using mean(apply(your model$predicted.match,1,all)) and compare it with previous model prediction.

# Write your code here.

# Compare the BIC of this model with the previous model.

A good improvement over the null model. But note that gender does not appear to be predictive of sending communication. A better model may be one without that specific term included.

Question 37: Fit another model excluding the non-significant term and again compare to the previous model by BIC and prediction.
Hint: Based on the above outputs, which statistics should be included/excluded?

# Write your code here.

Question 38: Is there any improvement?

7-2. Modeling endogenous social dynamics

For the classroom conversations, we can for example think of recency effects, turn-taking, sequential address, and turn-usurping. We can enter these terms into the model using their appropriate effect names (please look at the list and check the definition of statistics).

We keep the covariates from the best covariate model (i.e., model 3 from the previous section).
Question 39: Fit a model adding just a recency effect to the previous model and again check the improvement by BIC and prediction.

# Add recency effects + model 3:

# Compare the BIC of this model with the previous model (model3).

Question 40: Fit another model adding the conversational norms to the previous model (model 4) and check the improvement by BIC and prediction.

# Add conversational norms + model 4

# Compare the BIC of this model with the previous model (model4).

This is the end of Part 1 of REM practical. In Part 2 we will consider two other data sets Apollo 13 and Twitter data. You will fit simple REM models to these data and choose which statistics to include.

Practical - REM (Part 1)

Mahdi Shafiee Kamalabad

July 15 2024

0. Preparation

1. Relational event model (REM)

2. The `relevent` package and `rem.dyad` function

3. WTC data & visualization, centrality measures, and community detection

4. Fitting relational event models

4-1. A simple covariate model

4-2. Adding endogenous variables

5. Assessing model adequacy

6. Classroom data & visualization, centrality measures, and community detection

7. Exact Time Event Histories

7-1. Modeling with covariates

Practical - REM (Part 1)

Mahdi Shafiee Kamalabad

July 15 2024

0. Preparation

1. Relational event model (REM)

2. The relevent package and rem.dyad function

3. WTC data & visualization, centrality measures, and community detection

4. Fitting relational event models

4-1. A simple covariate model

4-2. Adding endogenous variables

5. Assessing model adequacy

6. Classroom data & visualization, centrality measures, and community detection

7. Exact Time Event Histories

7-1. Modeling with covariates

7-2. Modeling endogenous social dynamics

2. The `relevent` package and `rem.dyad` function