## Install the following package
# install.packages ("relevent")
library (relevent)
library(sna)

Welcome to the practical for the social network analysis. In this practical we will get hands-on experience with the materials in the lectures about applying Relational Event Models (REMs) on real relational event history (REH) data: Apollo 13 voice loop data, Twitter data, Class data and World Trade Center (WTC) by doing a bit of programming in R. These data sets are stored in UUsummerschool.rdata. Later in this practical we explain these data sets briefly.

1. Relational event model (REM)

As it was mentioned in the lecture, dyadic REMs are intended to capture the behavior of systems in which individual social units (persons, organizations, animals, companies, countries, etc.) direct discrete actions towards other individuals in their environment, which form a social network.

First, we need to load UUsummerschool.rdata and see what are inside of it.

# Set your own working directory
# setwd()

# Load data using load():
load("UUsummerschool.rdata") 
# This R-data contains 11 objects:
ls()
## [1] "as.sociomatrix.eventlist" "Class"                   
## [3] "ClassIntercept"           "ClassIsFemale"           
## [5] "ClassIsTeacher"           "PartOfApollo_13"         
## [7] "Twitter_data_rem3"        "WTCPoliceCalls"          
## [9] "WTCPoliceIsICR"

What are stored in UUsummerschool.rdata?
A function as.sociomatrix.eventlist that outputs a sociomatrix, and the rest are data sets such as PartOfApollo_13, Class, Twitter_data_rem3 and WTCPoliceCalls.

Probably you would ask: “What is a sociomatrix?” Before you let the computer do the social network analysis, it must have some basic data to calculate, like the connection between whom and whom. We should be aware that any one of those connections involves only two individuals. That is, it involves two nodes. This property might already inspire you to think the relationship as a mathematical expression, for instance a matrix of two dimensions. If we put the elements in the same sequence both at the row and the column, and we use 1 and 0 to represent whether there is a connection between the two elements or not, then we will have a matrix as we expect. The matrix is for the social network and we can give it an exclusive name, sociomatrix.

Secondly, let’s check all the objects.

as.sociomatrix.eventlist
## function (eventlist, n) 
## {
##     g <- matrix(0, n, n)
##     if (NROW(eventlist) > 0) {
##         tabmat <- table(eventlist[, -1, drop = FALSE])
##         g[as.numeric(dimnames(tabmat)[[1]]), as.numeric(dimnames(tabmat)[[2]])] <- tabmat
##     }
##     g
## }
## attr(,"source")
## [1] "function(eventlist,n){"                                                             
## [2] "  g<-matrix(0,n,n)"                                                                 
## [3] "  if(NROW(eventlist)>0){"                                                           
## [4] "    tabmat<-table(eventlist[,-1,drop=FALSE])"                                       
## [5] "    g[as.numeric(dimnames(tabmat)[[1]]), as.numeric(dimnames(tabmat)[[2]])]<-tabmat"
## [6] "  }"                                                                                
## [7] "  g"                                                                                
## [8] "}"
head(WTCPoliceCalls)
##   number source recipient
## 1      1     16        32
## 2      2     32        16
## 3      3     16        32
## 4      4     16        32
## 5      5     11        32
## 6      6     11        32
head(WTCPoliceCalls)
##   number source recipient
## 1      1     16        32
## 2      2     32        16
## 3      3     16        32
## 4      4     16        32
## 5      5     11        32
## 6      6     11        32
head(PartOfApollo_13)
##      time sender receiver
## 1 11849.2     18        2
## 2 11854.2      2       18
## 3 11885.2     18        2
## 4 11890.2      2       18
## 5 12232.2      2       17
## 6 12342.2     17        2
head(Class)
##   StartTime FromId ToId
## 1     0.135     14   12
## 2     0.270     12   14
## 3     0.405     18   12
## 4     0.540     12   18
## 5     0.675      1   12
## 6     0.810     12    1
head(Twitter_data_rem3)
##   time_day source target
## 1     0.00      8      1
## 2   121.03     28      2
## 3   199.08     28      2
## 4   266.95      4      3
## 5   573.59     22      5
## 6   574.49     25      5

2. The relevent package and rem.dyad function

The relevent package (Butts, 2015) in R provides implementation of relational event framework, with specialized functionality for easily fitting dyadic relational event models. Within the relevent package, the rem.dyad() function is the primary workhorse for modeling dyadic data. As you can see the related documentation, there are a bunch of arguments that can be used for different purposes. Here, in this practical, we only use a few of them (please see the pdf file rem.dyad.args). For more explanation please check the help file of rem.dyad as follows.

## You can access the help file in three different ways:
# 1)
??rem.dyad
# 2)
help(rem.dyad)
# 3)
?relevent::rem.dyad

There is also rem() for modeling the dyadic data. The main difference is that rem() needs the computed statistics in advance but rem.dyad() computes the statistics internally. However, if we have the value of statistics in advance rem() tends to be faster.

As applying the regression model, we should first decide which variables/statistics should be in our model. This can be done in several ways: experts knowledge, prior information and statistics/variables selection e.g., using BIC,etc. Note that not all effects may lead to identified models in all cases. It is up to the user to ensure that the postulated model makes sense.

Below you will see a short description of the data sets that we use in this practical.

3. Apollo 13 data

During this practical we will analyze part of the Apollo 13 mission. The mission launched as scheduled at 2:13:00EST (19:13:00 UTC) on April 11, 1970. On board were James Lovell Jr. (Commander, CDR), John ”Jack” Swigert Jr. (Command Module Pilot, CMP), and Fred Haise Jr. (Lunar Module Pilot, LMP). The mission was quite routine and everything went to plan, except when at 56:54:53, the astronauts heard a ”pretty large bang” and experienced fluctuations in electrical power and control thrusters. This sets a series of events in motion. With oxygen levels depleting fast, the astronauts not only faced a risk of running out of oxygen to breathe. Therefore, they decided to abort the mission and come back to earth. This indeed changed their communications and interactions during the mission, as both the astronauts and mission control had to solve unexpected and urgent problems in order to bring the crew home alive.

The data come from the Apollo 13’s voice loops transcripts, obtained from http://apollo13realtime.org/ and https://history.nasa.gov/afj/ap13fj/07day3-before-the-storm.html; the data include the Flight directors’ voice loop and the air-ground’s voice loop. Flight directors (Houston’s Mission Control Center) were located in Houston and the crew (astronauts) were connected to this control center via Capsule Communicator (CAPCOM). The Apollo 13 data is an ideal benchmark data to study communication/interaction pattern. In this practical we use a part of the data, one hour before incident till one hour after that. The eventlist is stored in an object called PartOfApollo_13 and contain four columns: time, sender, receiver and message. Note that we know precisely when these calls were made. Therefore, we apply REM considering that the the exact event times are known. That should be notified that the number of actors is 19.

4. Twitter data

The data was extracted from the Academic Twitter API in two steps. In the first step, all users being mentioned using the hashtags ‘#ic2s2’ or ‘#’netsciXXXX’ (XXXX = 2010, 2011, … 2022) were extracted. In the second step, all mentions of those users were extracted, up to 800 tweets. Finally, the core of the network was extracted by keeping users with an in-degree and out-degree over K=150 mentions. As there were some overlaps in the time, we modified the data in order to be usable for the REM model. The source or sender is a person tweeting and the target or receiver is the person mentioned. Note that the time was date of tweet, which was converted to day. In this REH data the number of actors is 39.

5. Classroom data

The classroom dataset includes timed interactions between students and instructors within a high school classroom (McFarland, 2001; Bender-deMoll and McFarland, 2006). Sender and receiver events (n=691) were recorded between 20 actors (18 students and 2 teachers) along with the time of the events in increments of minutes. Note that the data employed here were modified slightly to increase the amount of time occurring between very closely recorded events, ensuring no simultaneity of events as assumed by the relational event framework. Two actor-level covariates are also at hand in the dataset used here: whether the actor was a teacher and whether the actor was female.

6. WTC data & visualization

This data contains information about the 9/11 terrorist attacks at the WTC in New York City in 2001. That should be notified that the radio communication was essential in coordinating activities during the crisis. Butts et al.(2007) coded radio communication events between officers responding to 9/11 from transcripts of communications recorded during the event. It is important to note that the WTC radio data was coded from transcripts that lacked detailed timing information; we do not therefore know precisely when these calls were made. But we know the order of events. Since the case of ordinal timing is somewhat simpler than that of exact timing, we consider the WTC data first in this practical. We will illustrate ordinal time REMs using the 481 communication events from 37 named communicants in this data set.

The eventlist is stored in an object called WTCPoliceCalls.
Question 1: Check the data using head and tailfunction. How many events does the data contain? How many actors? Without testing anything, can you see any trend? If so, which statistics can capture that trend?

# Check the data.
head(WTCPoliceCalls)
##   number source recipient
## 1      1     16        32
## 2      2     32        16
## 3      3     16        32
## 4      4     16        32
## 5      5     11        32
## 6      6     11        32

You can see the reciprocation and recency pattern in the first few events: responding officer 16 called officer 32 in the first event, officer 32 then called 16 back in the second, which might be characterized as a local reciprocity effect or \(AB \rightarrow BA\) participation shift. This was followed by 32 being the target of the next four calls, which is probably due to either a role of 32 such as a coordinator or perhaps the presence of a recency mechanism.

Question 2: Use the included sna function as.sociomatrix.edgelist() to convert the eventlist into a valued sociomatrix. For the definition of sociomatrix see previous section.
Hint: as.sociomatrix.edgelist(data, number of actors).

WTCPoliceNet <- as.sociomatrix.eventlist(WTCPoliceCalls ,37)

Question 3: Once you have the sociomatrix object, plot the aggregated network using gplot().
Hint: gplot(created sociomatrix, the arguments such as the size of nodes, width of edges, colors of nodes, etc.)
gplot creates a node-and-edge visualization of a network. It can accept an adjacency matrix (N × N), an array of adjacency matrices (m × N × N), network objects, and other forms of network data (check sna library to see what else is possible).
gplot has many arguments that you can tweak your network model visualization, for example:

  • vertex.col: set the node color.
  • edge.col: color for edge border and fill.
  • vertex.sides: number of sides for node polygon. ex) triangle = 3, square = 4 (default = 8).
  • label.bg: color for the background of node.
    Check the help file of gplot to find other features.
gplot(WTCPoliceNet, edge.lwd=WTCPoliceNet^0.75, arrowhead.cex=
        log(as.edgelist.sna(WTCPoliceNet)[,3])+.25,
        vertex.col= ifelse(WTCPoliceIsICR ,"black","gray"),
        vertex.cex=1.25, vertex.sides=ifelse(WTCPoliceIsICR ,4,100))  
Police Communication Network

Police Communication Network

The plot above is the time-aggregated WTC Police communication network.

  • The three black square nodes represent actors who fill institutional coordinator roles and gray circle nodes represent all other communicants (one of the square nodes is in the center).
  • A directed edge is drawn between two actors, i and j, if actor i ever called actor j on the radio. The edges and arrowheads are scaled in proportion to the number of calls over time. This is clearly a hub-dominated network with two actors sitting on the majority of paths between all other actors.
  • The actor with the plurality of communication ties is an institutional coordinator (the square node at the center of the figure), heterogeneity in sending and receiving communication ties is evident, with several high-degree non-coordinators and two low-degree institutional coordinators, in the network. This source of heterogeneity is a good starting place from which to build our model.

7. Fitting relational event models

7-1. A simple covariate model

The question is whether the propensity of individuals to send and receive calls depends on whether they occupy institutionalized coordinator roles (ICR) or not. WTCPoliceIsICR is a vector that shows the ICR.

Question 4: Use rem.dyad() by passing appropriate arguments.
Hint: rem.dyad(data..., n = number of actors, effects = c("CovInt"), covar = list(CovInt = WTCPoliceIsICR), hessian = ...).
Please Check the description of arguments (?rem.dyad).

wtcfit1 <- rem.dyad(WTCPoliceCalls,n=37,effects=c("CovInt"), 
                   covar=list(CovInt=WTCPoliceIsICR),hessian=TRUE)
## Prepping edgelist.
## Checking/prepping covariates.
## Computing preliminary statistics
## Fitting model
## Obtaining goodness-of-fit statistics

Once you have fit the model, you can summarize the model fit using summary().
**Question 5: Check the model summary. Then See how much were not very well predicted by your model using mean(apply(your model$predicted.match,1,all)).predicted.matchprovides a two-column matrix of true and false corresponding to the sender and receiver columns in the main data frame. It shows whether the model could predict the sender and receiver truly.** Hint:summary(your model name)`.

summary(wtcfit1)
## Relational Event Model (Ordinal Likelihood)
## 
##          Estimate  Std.Err Z value  Pr(>|z|)    
## CovInt.1 2.104464 0.069817  30.142 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Null deviance: 6921.048 on 481 degrees of freedom
## Residual deviance: 6193.998 on 480 degrees of freedom
##  Chi-square: 727.0499 on 1 degrees of freedom, asymptotic p-value 0 
## AIC: 6195.998 AICC: 6196.007 BIC: 6200.174
mean(apply(wtcfit1$predicted.match,1,all))
## [1] 0

The output gives us the covariate effect, some uncertainty, and goodness-of-fit information. The format is like the output for a regression model summary, but note that the coefficients should be interpreted considering the relational event framework. In particular, the ICR role coefficient is the logged multiplier for the hazard of an event involving an ICR versus a non-ICR event ( \(e^{\lambda_1}\) ). This effect is cumulative: an event in which one actor in an ICR calls another actor in an ICR gets twice the log increment (\(e^{2\lambda_1}\) ).

Question 6: Use exp() to check the value of \(e^{\lambda_1}\) and \(e^{2\lambda_1}\).
Hint: You can extract the coeffcient value by using $ sign (e.g., yourmodel$coef).

## Relative hazard for a non-ICR/ICR 
exp(wtcfit1$coef)
## CovInt.1 
## 8.202709
## Relative hazard for an ICR/ICR  effect) 
exp(2*wtcfit1$coef)
## CovInt.1 
## 67.28443

Next, we evaluate whether it is worth treating these effects separately with respect to ICR status.
To do so, we enter the ICR covariate as a sender and receiver covariate, respectively.

**Question 7: Fit another model adding the covariates and check the model summary as you did above. See how much were not very well predicted by your model using mean(apply(your model$predicted.match,1,all)) and compare it with previous model prediction** Hint: It is the same as what we did for Q4. But here we should consider the effect for the sender and for the receiver separately. The structure is the same but we need to select "CovSnd",and " CovRec". Check the description of arguments to see what you need to specify foreffects`.

wtcfit2<-rem.dyad(WTCPoliceCalls,n=37,effects=c("CovSnd","CovRec"),
                  covar=list(CovSnd=WTCPoliceIsICR ,CovRec= WTCPoliceIsICR),hessian=TRUE)
## Prepping edgelist.
## Checking/prepping covariates.
## Computing preliminary statistics
## Fitting model
## Obtaining goodness-of-fit statistics
summary(wtcfit2) # check the model output
## Relational Event Model (Ordinal Likelihood)
## 
##          Estimate  Std.Err Z value  Pr(>|z|)    
## CovSnd.1 1.979175 0.095745  20.671 < 2.2e-16 ***
## CovRec.1 2.225720 0.092862  23.968 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Null deviance: 6921.048 on 481 degrees of freedom
## Residual deviance: 6190.175 on 479 degrees of freedom
##  Chi-square: 730.8731 on 2 degrees of freedom, asymptotic p-value 0 
## AIC: 6194.175 AICC: 6194.2 BIC: 6202.527
mean(apply(wtcfit2$predicted.match,1,all))-mean(apply(wtcfit1$predicted.match,1,all))
## [1] 0

Question 8: Now, evaluate which model is preferred by BIC (lower is better) and explain what the BIC tells us.
Hint: You can extract BIC value from each model by using $ sign (e.g., yourmodel$BIC).

wtcfit1$BIC - wtcfit2$BIC
## [1] -2.352663

7-2. Adding endogenous variables

Add the reciprocity term: \(AB \rightarrow BA\). For the definition of reciprocity, see the slides. In particular, the AB-BA shift, which captures the tendency for B to call A, given that A has just called B, is likely at play in radio communication.

Question 9: Fit another model adding the reciprocity term and check the model summary. See how much were not very well predicted by your model using `mean(apply(your model$predicted.match,1,all)) and compare it with previous model prediction.

Hint: Add “PSAB-BA” as another statistic to the best model. Check the description of arguments to see what you need to specify for effects.

wtcfit3 <- rem.dyad(WTCPoliceCalls,n=37,effects=c("CovInt","PSAB-BA"),
                  covar=list(CovInt=WTCPoliceIsICR),hessian=TRUE)
## Prepping edgelist.
## Checking/prepping covariates.
## Computing preliminary statistics
## Fitting model
## Obtaining goodness-of-fit statistics
summary(wtcfit3)
## Relational Event Model (Ordinal Likelihood)
## 
##          Estimate Std.Err Z value  Pr(>|z|)    
## CovInt.1  1.60405 0.11500  13.949 < 2.2e-16 ***
## PSAB-BA   7.32695 0.10552  69.436 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Null deviance: 6921.048 on 481 degrees of freedom
## Residual deviance: 2619.115 on 479 degrees of freedom
##  Chi-square: 4301.933 on 2 degrees of freedom, asymptotic p-value 0 
## AIC: 2623.115 AICC: 2623.14 BIC: 2631.467
mean(apply(wtcfit3$predicted.match,1,all))-mean(apply(wtcfit2$predicted.match,1,all))
## [1] 0.6839917

Question 10: Again, evaluate which model is preferred by BIC.

# Compare the BIC of this model with the first model.
wtcfit1$BIC - wtcfit3$BIC
## [1] 3568.707

Note that the lower BIC the better the model. It appears that there is a very strong reciprocity effect. Doesn’t it?

We may expect that the current participants in a communication are likely to initiate the next call and that one’s most recent communications may be the most likely to be returned.
These processes can be captured with the participation shifts for dyadic turn receiving/continuing and recency effects, respectively.

**Question 11: Fit another model adding the p-shift effects to the previous model and check the model summary. See how much were not very well predicted by your model using mean(apply(your model$predicted.match,1,all)) and compare it with previous model. prediction** Hint: Add "PSAB-BY" and "PSAB-AY" to the best model. For more information check the description of arguments to see what you need to specify foreffects`.

## Adding p-shift effects to model 3
wtcfit4<-rem.dyad(WTCPoliceCalls,n=37,effects=c("CovInt","PSAB-BA","PSAB-BY","PSAB-AY"),
                  covar=list(CovInt=WTCPoliceIsICR),hessian=TRUE)
## Prepping edgelist.
## Checking/prepping covariates.
## Computing preliminary statistics
## Fitting model
## Obtaining goodness-of-fit statistics
summary(wtcfit4)
## Relational Event Model (Ordinal Likelihood)
## 
##          Estimate Std.Err Z value  Pr(>|z|)    
## CovInt.1  1.54283 0.11818 13.0549 < 2.2e-16 ***
## PSAB-BA   7.49956 0.11418 65.6830 < 2.2e-16 ***
## PSAB-BY   1.25942 0.25131  5.0115 5.401e-07 ***
## PSAB-AY   0.87217 0.30611  2.8492  0.004384 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Null deviance: 6921.048 on 481 degrees of freedom
## Residual deviance: 2595.135 on 477 degrees of freedom
##  Chi-square: 4325.913 on 4 degrees of freedom, asymptotic p-value 0 
## AIC: 2603.135 AICC: 2603.219 BIC: 2619.839
mean(apply(wtcfit4$predicted.match,1,all))-mean(apply(wtcfit3$predicted.match,1,all))
## [1] 0

Question 12: Evaluate which model is preferred.

# Compare the BIC of this model with the previous model (third model).
wtcfit3$BIC - wtcfit4$BIC
## [1] 11.62806

Question 13: Lastly, fit a model adding the recency effects to the previous model and check the model summary. See how much were not very well predicted by your model using `mean(apply(your model$predicted.match,1,all)) and compare it with previous model. prediction
Hint: Read the definition of “RRecSnd”, and “RSndSnd”. Are they the statistics that we need for this purpose?

## Adding recency effects to model 4
wtcfit5<-rem.dyad(WTCPoliceCalls,n=37,
                  effects=c("CovInt","PSAB-BA","PSAB-BY","PSAB-AY","RRecSnd","RSndSnd"),
                  covar=list(CovInt=WTCPoliceIsICR),hessian=TRUE)
## Prepping edgelist.
## Checking/prepping covariates.
## Computing preliminary statistics
## Fitting model
## Obtaining goodness-of-fit statistics
summary(wtcfit5)
## Relational Event Model (Ordinal Likelihood)
## 
##          Estimate Std.Err Z value  Pr(>|z|)    
## RRecSnd   2.38496 0.27447  8.6892 < 2.2e-16 ***
## RSndSnd   1.34622 0.22307  6.0349 1.590e-09 ***
## CovInt.1  1.07058 0.14244  7.5160 5.640e-14 ***
## PSAB-BA   4.88714 0.15293 31.9569 < 2.2e-16 ***
## PSAB-BY   1.67938 0.26116  6.4304 1.273e-10 ***
## PSAB-AY   1.39016 0.31057  4.4762 7.599e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Null deviance: 6921.048 on 481 degrees of freedom
## Residual deviance: 2307.413 on 475 degrees of freedom
##  Chi-square: 4613.635 on 6 degrees of freedom, asymptotic p-value 0 
## AIC: 2319.413 AICC: 2319.591 BIC: 2344.469
mean(apply(wtcfit5$predicted.match,1,all))-mean(apply(wtcfit4$predicted.match,1,all))
## [1] 0

Question 14: Evaluate which model is preferred.

# Compare the BIC of this model with the previous model (fourth model).
wtcfit4$BIC - wtcfit5$BIC
## [1] 275.3701

The results indicate that turn-receiving, turn-continuing, and recency effects are all at play in the relational event process.

Question 15: Can you think of adding more statistics?
Please add a few more and see how the BIC changes. To this end, check the definition of statistics precisely.

# Try it out yourself.

Question 16: As there are a lot of statistics that we can use, can you think of some ways for statistics selection?

# Try it out yourself.

8. Assessing model adequacy

Model adequacy is a crucial consideration: even though the model fit is sufficiently good, you need to think of whether this model is good enough for our purposes. For example, the ability of the relational event model to predict the next event in the sequence, given those that have come before.
Note that we do not discuss this further in this practical but interested students can check Butts 2008.

9. Exact Time Event Histories

We are going to use REMs for event histories with exact timing information.
Let’s start by examining the raw temporal data and the time-aggregated network.
In this case, event time is given in increments of minutes from the onset of observation.

The edgelist is stored in an object called Class.
Question 17: Check the data again using head(). Can you see any trend?

## The edgelist is stored in an object called `Class`.
head(Class)
##   StartTime FromId ToId
## 1     0.135     14   12
## 2     0.270     12   14
## 3     0.405     18   12
## 4     0.540     12   18
## 5     0.675      1   12
## 6     0.810     12    1

9-1. Time-aggregated network visualization

Question 18: First, convert the eventlist into a valued sociomatrix using as.sociomatrix.edgelist() as you did above.
Hint: as.sociomatrix.edgelist(data, number of actors).

ClassNet <- as.sociomatrix.eventlist(Class ,20)

Question 19: Plot the network using gplot().
Here we color the female nodes black, the male nodes gray, represent teachers as square-shaped nodes and students as triangle-shaped nodes. Edges between nodes are likewise scaled proportional to the number of communication events transpiring between actors.

Hint: Please check the hint of Q3 and consult the help file of gplot.

gplot(ClassNet ,vertex.col=ifelse(ClassIsFemale ,"black","gray"),
     vertex.sides=3+ClassIsTeacher ,vertex.cex=2, edge.lwd= ClassNet ^0.75)
Time-Aggregated Classroom Communications

Time-Aggregated Classroom Communications

9-2. Modeling with covariates

**Question 20: Fit a simple intercept model, containing only a vector of 1s as an actor-level sending effect. And check the model summary. See how much were not very well predicted by your model using mean(apply(your model$predicted.match,1,all)) ** Hint: This vector is saved asClassIntercept, which we can pass to the respective covariate arguments inrem.dyad(). Also note that we do not want to discard timing information (ordinal=FALSE`).

classfit1<-rem.dyad(Class,n=20,effects=c("CovSnd"),
                      covar= list(CovSnd=ClassIntercept), ordinal=FALSE,hessian=TRUE)
## Prepping edgelist.
## Checking/prepping covariates.
## Computing preliminary statistics
## Fitting model
## Obtaining goodness-of-fit statistics
summary(classfit1)
## Relational Event Model (Temporal Likelihood)
## 
##           Estimate   Std.Err Z value  Pr(>|z|)    
## CovSnd.1 -3.332287  0.038042 -87.596 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Null deviance: 5987.221 on 691 degrees of freedom
## Residual deviance: 5987.221 on 691 degrees of freedom
##  Chi-square: -5.638867e-11 on 0 degrees of freedom, asymptotic p-value 1 
## AIC: 5989.221 AICC: 5989.227 BIC: 5993.759
mean(apply(classfit1$predicted.match,1,all))
## [1] 0

The model does not fit any better than the null, because it is equivalent to the null model (as indicated by the absence of difference between the null and residual deviance).

Question 21: Now fit a more interesting covariate model that considers gender(male/female) and role(teacher/student) for senders and receivers. And evaluate whether there is any improvement over the intercept-only model by BIC. See how much were not very well predicted by your model using `mean(apply(your model$predicted.match,1,all)) and compare it with previous model prediction

classfit2<-rem.dyad(Class,n=20,effects=c("CovSnd","CovRec"),
                    covar=list(CovSnd=cbind(ClassIntercept ,ClassIsTeacher ,
                    ClassIsFemale), CovRec=cbind(ClassIsTeacher ,ClassIsFemale))
                    ,ordinal=FALSE ,hessian=TRUE)
## Prepping edgelist.
## Checking/prepping covariates.
## Computing preliminary statistics
## Fitting model
## Obtaining goodness-of-fit statistics
summary(classfit2)
## Relational Event Model (Temporal Likelihood)
## 
##           Estimate   Std.Err  Z value Pr(>|z|)    
## CovSnd.1 -3.834265  0.078843 -48.6317  < 2e-16 ***
## CovSnd.2  1.672575  0.091680  18.2436  < 2e-16 ***
## CovSnd.3  0.123916  0.094932   1.3053  0.19179    
## CovRec.1  0.373729  0.127030   2.9421  0.00326 ** 
## CovRec.2  0.165761  0.080896   2.0491  0.04046 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Null deviance: 5987.221 on 691 degrees of freedom
## Residual deviance: 5652.318 on 687 degrees of freedom
##  Chi-square: 334.9034 on 4 degrees of freedom, asymptotic p-value 0 
## AIC: 5662.318 AICC: 5662.405 BIC: 5685.008
# Compare the BIC of this model with the previous model.
classfit1$BIC - classfit2$BIC
## [1] 308.7508
mean(apply(classfit2$predicted.match,1,all))-mean(apply(classfit1$predicted.match,1,all))
## [1] 0.01013025

A good improvement over the null model. But note that gender does not appear to be predictive of sending communication. A better model may be one without that specific term included.

Question 22: Fit another model excluding the non-significant term and again compare to the previous model by BIC and prediction.
Hint: Based on the above outputs, which statistics should be included/excluded?

classfit3<-rem.dyad(Class,n=20,effects=c("CovSnd","CovRec"), 
                    covar=list(CovSnd=cbind(ClassIntercept ,ClassIsTeacher),
                               CovRec=cbind(ClassIsTeacher,ClassIsFemale)),
                                          ordinal=FALSE,hessian=TRUE)
## Prepping edgelist.
## Checking/prepping covariates.
## Computing preliminary statistics
## Fitting model
## Obtaining goodness-of-fit statistics
summary(classfit3)
## Relational Event Model (Temporal Likelihood)
## 
##           Estimate   Std.Err  Z value  Pr(>|z|)    
## CovSnd.1 -3.775224  0.063622 -59.3379 < 2.2e-16 ***
## CovSnd.2  1.615762  0.079933  20.2139 < 2.2e-16 ***
## CovRec.1  0.371758  0.127019   2.9268  0.003425 ** 
## CovRec.2  0.161157  0.080815   1.9941  0.046137 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Null deviance: 5987.221 on 691 degrees of freedom
## Residual deviance: 5654.016 on 688 degrees of freedom
##  Chi-square: 333.2049 on 3 degrees of freedom, asymptotic p-value 0 
## AIC: 5662.016 AICC: 5662.074 BIC: 5680.169
classfit2$BIC - classfit3$BIC
## [1] 4.839661
mean(apply(classfit3$predicted.match,1,all))-mean(apply(classfit2$predicted.match,1,all))
## [1] 0

Question 23: Is there any improvement?

9-3. Modeling endogenous social dynamics

For the classroom conversations, we can for example think of recency effects, turn-taking, sequential address, and turn-usurping. We can enter these terms into the model using their appropriate effect names (please look at the list and check the definition of statistics).

We keep the covariates from the best covariate model (i.e., model 3 from the previous section).
Question 24: Fit a model adding just a recency effect to the previous model and again check the improvement by BIC and prediction.

#First, just recency effects + model 3:
classfit4<-rem.dyad(Class,n=20,effects=c("CovSnd","CovRec","RRecSnd","RSndSnd"),
                    covar=list(CovSnd=cbind(ClassIntercept ,ClassIsTeacher), 
                               CovRec=cbind(ClassIsTeacher ,ClassIsFemale )),
                                           ordinal=FALSE ,hessian=TRUE)
## Prepping edgelist.
## Checking/prepping covariates.
## Computing preliminary statistics
## Fitting model
## Obtaining goodness-of-fit statistics
# Compare the BIC of this model with the previous model (model3).
classfit3$BIC - classfit4$BIC
## [1] 1118.294
mean(apply(classfit4$predicted.match,1,all))-mean(apply(classfit3$predicted.match,1,all))
## [1] 0.02170767

Question 25: Fit another model adding the conversational norms to the previous model (model 4) and check the improvement by BIC and prediction.

## Next conversational norms + model 4
## "PSAB-BA","PSAB-AY","PSAB-BY"

classfit5<-rem.dyad(Class,n=20,effects=c("CovSnd","CovRec","RRecSnd","RSndSnd", "PSAB-BA","PSAB-AY","PSAB-BY"),
covar= list(CovSnd=cbind(ClassIntercept,ClassIsTeacher), 
CovRec= cbind(ClassIsTeacher,ClassIsFemale)),ordinal=FALSE,hessian= TRUE)
## Prepping edgelist.
## Checking/prepping covariates.
## Computing preliminary statistics
## Fitting model
## Obtaining goodness-of-fit statistics
summary(classfit5)
## Relational Event Model (Temporal Likelihood)
## 
##           Estimate   Std.Err  Z value  Pr(>|z|)    
## RRecSnd   2.429212  0.155366  15.6354 < 2.2e-16 ***
## RSndSnd  -0.986744  0.144667  -6.8208 9.055e-12 ***
## CovSnd.1 -5.003424  0.090609 -55.2201 < 2.2e-16 ***
## CovSnd.2  1.253889  0.085160  14.7239 < 2.2e-16 ***
## CovRec.1 -0.722687  0.141950  -5.0911 3.559e-07 ***
## CovRec.2  0.047931  0.081325   0.5894    0.5556    
## PSAB-BA   4.622143  0.137601  33.5910 < 2.2e-16 ***
## PSAB-BY   1.677549  0.164933  10.1711 < 2.2e-16 ***
## PSAB-AY   2.869962  0.103113  27.8331 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Null deviance: 5987.221 on 691 degrees of freedom
## Residual deviance: 2803.315 on 683 degrees of freedom
##  Chi-square: 3183.906 on 8 degrees of freedom, asymptotic p-value 0 
## AIC: 2821.315 AICC: 2821.58 BIC: 2862.158
# Compare the BIC of this model with the previous model (model4).
classfit4$BIC - classfit5$BIC
## [1] 1699.716
mean(apply(classfit5$predicted.match,1,all))-mean(apply(classfit4$predicted.match,1,all))
## [1] 0.2995658

10. Project 1: Application of REM on Apollo 13 data

We are again going to use REMs for event histories with exact timing information.
Let’s consider Apollo 13 mission data. Read the description of the data at the beginning of this practical and also check the aforementioned websites for more information. In this case, event time is given in increments of seconds from the onset of observation.

In this data the actors are as follows:

  • AFD: Assistant Flight Director from Flight directors (1)

  • CAPCOM: Capsule Communicator from Flight directors (2)

  • CONTROL: Control Officer from Flight directors (3)

  • EECOM: Electrical, Environmental and Consumables Manager from Flight directors (4)

  • All : Ground control team (without flight directores) (5)

  • FDO : Flight dynamics officer (FDO or FIDO) (6)

  • FLIGHT: Flight Director from Flight directors (7)

  • GNC: The Guidance, Navigation, and Controls Systems Engineer from Flight directors (8)

  • GUIDO: Guidance Officer from Flight directors (9)

  • INCO: Integrated Communications Officer from Flight directors (10)

  • NETWORK: Network of ground stations from Flight directors (11)

  • TELMU: Telemetry, Electrical, and EVA Mobility Unit Officer from Flight directors (12)

  • RECOVERY: Recovery Supervisor from Flight directorsc (13)

  • PROCEDURES: Organization & Procedures Officer from Flight directors (14)

  • FAO: Flight activities officer from Flight directors (15)

  • RETRO: Retrofire Officer from Flight directors (16)

  • CDR: Commander James A. Lovell Jr. crew (astronauts) (17)

  • CMP: Command Module Pilot John (Jack) L. Swigert Jr. crew (astronauts) (18)

  • LMP: Lunar module pilot Fred W. Haise Jr. crew (astronauts) (19)

Question 26: First look at the data, plot the network and start with fitting a simple model. Next, add the statistics of interest to the model and see the performance of the new model.

Apollo_mission1<-rem.dyad(PartOfApollo_13,n=19,effects=c("PSAB-BA"), ordinal=TRUE,hessian= TRUE)
## Prepping edgelist.
## Checking/prepping covariates.
## Computing preliminary statistics
## Fitting model
## Obtaining goodness-of-fit statistics
summary(Apollo_mission1)
## Relational Event Model (Ordinal Likelihood)
## 
##         Estimate  Std.Err Z value  Pr(>|z|)    
## PSAB-BA 6.453583 0.033668  191.69 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Null deviance: 45301.47 on 3882 degrees of freedom
## Residual deviance: 20850.3 on 3881 degrees of freedom
##  Chi-square: 24451.17 on 1 degrees of freedom, asymptotic p-value 0 
## AIC: 20852.3 AICC: 20852.3 BIC: 20858.56
Apollo_mission2<-rem.dyad(PartOfApollo_13,n=19,effects=c("PSAB-BA","RRecSnd"), ordinal=TRUE,hessian= TRUE)
## Prepping edgelist.
## Checking/prepping covariates.
## Computing preliminary statistics
## Fitting model
## Obtaining goodness-of-fit statistics
summary(Apollo_mission2)
## Relational Event Model (Ordinal Likelihood)
## 
##         Estimate  Std.Err Z value  Pr(>|z|)    
## RRecSnd 5.065585 0.067967  74.530 < 2.2e-16 ***
## PSAB-BA 2.953172 0.038625  76.458 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Null deviance: 45301.47 on 3882 degrees of freedom
## Residual deviance: 14706.31 on 3880 degrees of freedom
##  Chi-square: 30595.16 on 2 degrees of freedom, asymptotic p-value 0 
## AIC: 14710.31 AICC: 14710.32 BIC: 14722.84
Apollo_mission2$BIC- Apollo_mission1$BIC
## [1] -6135.723
Apollo_mission3<-rem.dyad(PartOfApollo_13,n=19,effects=c("PSAB-BA","RRecSnd","ITPSnd"), ordinal=TRUE,hessian= TRUE)
## Prepping edgelist.
## Checking/prepping covariates.
## Computing preliminary statistics
## Fitting model
## Obtaining goodness-of-fit statistics
summary(Apollo_mission3)
## Relational Event Model (Ordinal Likelihood)
## 
##            Estimate     Std.Err Z value Pr(>|z|)    
## RRecSnd  5.07271311  0.06838339 74.1805   <2e-16 ***
## ITPSnd  -0.00038060  0.00043885 -0.8673   0.3858    
## PSAB-BA  2.95169869  0.03863341 76.4027   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Null deviance: 45301.47 on 3882 degrees of freedom
## Residual deviance: 14705.64 on 3879 degrees of freedom
##  Chi-square: 30595.83 on 3 degrees of freedom, asymptotic p-value 0 
## AIC: 14711.64 AICC: 14711.65 BIC: 14730.43
Apollo_mission3$BIC- Apollo_mission2$BIC
## [1] 7.592294
ApolloNet <- as.sociomatrix.eventlist(PartOfApollo_13 ,19)

gplot(ApolloNet, edge.lwd=ApolloNet^0.75, arrowhead.cex=
        log(as.edgelist.sna(ApolloNet)[,3])+.25,vertex.col="gray",
        vertex.cex=1.25)  
Apolo13 Network

Apolo13 Network

11. Project 2: Application of REM on Twitter data

Question 27: First, look at the data, then plot the network. Can you see any trends in the network? Start with fitting a REM model and add the statistics of interest sequentially. Can you see any improvement based on the BIC?

Hint: Consider some of these statistics:“PSAB-BA”, “ISPSnd”, “PSAB-BA”, “PSAB-XB”, “NIDSnd”, “NIDRec”, and “NODSnd”.

head(Twitter_data_rem3)
##   time_day source target
## 1     0.00      8      1
## 2   121.03     28      2
## 3   199.08     28      2
## 4   266.95      4      3
## 5   573.59     22      5
## 6   574.49     25      5
twitter_NET <- as.sociomatrix.eventlist(Twitter_data_rem3 ,39)

gplot(twitter_NET, edge.lwd=twitter_NET^0.75, arrowhead.cex=
          log(as.edgelist.sna(twitter_NET)[,3])+.25,
      vertex.col= "black",
      vertex.cex=1.25)  
Twitter data Network

Twitter data Network

twitter1<- rem.dyad(Twitter_data_rem3,n=39, effects = c("PSAB-BA", "PSAB-BY"), ordinal = FALSE,hessian = TRUE)
## Prepping edgelist.
## Checking/prepping covariates.
## Computing preliminary statistics
## Fitting model
## Obtaining goodness-of-fit statistics
summary(twitter1)
## Relational Event Model (Temporal Likelihood)
## 
##         Estimate  Std.Err Z value  Pr(>|z|)    
## PSAB-BA -6.76516  0.57582 -11.749 < 2.2e-16 ***
## PSAB-BY -8.28376  0.20387 -40.633 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Null deviance: 25024.13 on 1405 degrees of freedom
## Residual deviance: 7425381 on 1404 degrees of freedom
##  Chi-square: -7400357 on 1 degrees of freedom, asymptotic p-value 1 
## AIC: 7425385 AICC: 7425385 BIC: 7425396