## Install the following package
# install.packages ("relevent")
library (relevent)
library(sna)
Welcome to the practical for the social network analysis. In this
practical we will get hands-on experience with the materials in the
lectures about applying Relational Event Models (REMs) on real
relational event history (REH) data: Apollo 13 voice loop data, Twitter
data, Class data and World Trade Center (WTC) by doing a bit of
programming in R. These data sets are stored in
UUsummerschool.rdata
. Later in this practical we explain
these data sets briefly.
As it was mentioned in the lecture, dyadic REMs are intended to capture the behavior of systems in which individual social units (persons, organizations, animals, companies, countries, etc.) direct discrete actions towards other individuals in their environment, which form a social network.
First, we need to load UUsummerschool.rdata
and see what
are inside of it.
# Set your own working directory
# setwd()
# Load data using load():
load("UUsummerschool.rdata")
# This R-data contains 11 objects:
ls()
## [1] "as.sociomatrix.eventlist" "Class"
## [3] "ClassIntercept" "ClassIsFemale"
## [5] "ClassIsTeacher" "PartOfApollo_13"
## [7] "Twitter_data_rem3" "WTCPoliceCalls"
## [9] "WTCPoliceIsICR"
What are stored in UUsummerschool.rdata
?
A function as.sociomatrix.eventlist
that outputs a
sociomatrix, and the rest are data sets such as
PartOfApollo_13
, Class
,
Twitter_data_rem3
and WTCPoliceCalls
.
Probably you would ask: “What is a sociomatrix?” Before you let the computer do the social network analysis, it must have some basic data to calculate, like the connection between whom and whom. We should be aware that any one of those connections involves only two individuals. That is, it involves two nodes. This property might already inspire you to think the relationship as a mathematical expression, for instance a matrix of two dimensions. If we put the elements in the same sequence both at the row and the column, and we use 1 and 0 to represent whether there is a connection between the two elements or not, then we will have a matrix as we expect. The matrix is for the social network and we can give it an exclusive name, sociomatrix.
Secondly, let’s check all the objects.
as.sociomatrix.eventlist
## function (eventlist, n)
## {
## g <- matrix(0, n, n)
## if (NROW(eventlist) > 0) {
## tabmat <- table(eventlist[, -1, drop = FALSE])
## g[as.numeric(dimnames(tabmat)[[1]]), as.numeric(dimnames(tabmat)[[2]])] <- tabmat
## }
## g
## }
## attr(,"source")
## [1] "function(eventlist,n){"
## [2] " g<-matrix(0,n,n)"
## [3] " if(NROW(eventlist)>0){"
## [4] " tabmat<-table(eventlist[,-1,drop=FALSE])"
## [5] " g[as.numeric(dimnames(tabmat)[[1]]), as.numeric(dimnames(tabmat)[[2]])]<-tabmat"
## [6] " }"
## [7] " g"
## [8] "}"
head(WTCPoliceCalls)
## number source recipient
## 1 1 16 32
## 2 2 32 16
## 3 3 16 32
## 4 4 16 32
## 5 5 11 32
## 6 6 11 32
head(WTCPoliceCalls)
## number source recipient
## 1 1 16 32
## 2 2 32 16
## 3 3 16 32
## 4 4 16 32
## 5 5 11 32
## 6 6 11 32
head(PartOfApollo_13)
## time sender receiver
## 1 11849.2 18 2
## 2 11854.2 2 18
## 3 11885.2 18 2
## 4 11890.2 2 18
## 5 12232.2 2 17
## 6 12342.2 17 2
head(Class)
## StartTime FromId ToId
## 1 0.135 14 12
## 2 0.270 12 14
## 3 0.405 18 12
## 4 0.540 12 18
## 5 0.675 1 12
## 6 0.810 12 1
head(Twitter_data_rem3)
## time_day source target
## 1 0.00 8 1
## 2 121.03 28 2
## 3 199.08 28 2
## 4 266.95 4 3
## 5 573.59 22 5
## 6 574.49 25 5
relevent
package and rem.dyad
functionThe relevent
package (Butts, 2015) in R provides
implementation of relational event framework, with specialized
functionality for easily fitting dyadic relational event models. Within
the relevent
package, the rem.dyad()
function
is the primary workhorse for modeling dyadic data. As you can see the
related documentation, there are a bunch of arguments that can be used
for different purposes. Here, in this practical, we only use a few of
them (please see the pdf file rem.dyad.args
). For more
explanation please check the help file of rem.dyad
as
follows.
## You can access the help file in three different ways:
# 1)
??rem.dyad
# 2)
help(rem.dyad)
# 3)
?relevent::rem.dyad
There is also rem()
for modeling the dyadic data. The
main difference is that rem()
needs the computed statistics
in advance but rem.dyad()
computes the statistics
internally. However, if we have the value of statistics in advance
rem()
tends to be faster.
As applying the regression model, we should first decide which variables/statistics should be in our model. This can be done in several ways: experts knowledge, prior information and statistics/variables selection e.g., using BIC,etc. Note that not all effects may lead to identified models in all cases. It is up to the user to ensure that the postulated model makes sense.
Below you will see a short description of the data sets that we use in this practical.
During this practical we will analyze part of the Apollo 13 mission. The mission launched as scheduled at 2:13:00EST (19:13:00 UTC) on April 11, 1970. On board were James Lovell Jr. (Commander, CDR), John ”Jack” Swigert Jr. (Command Module Pilot, CMP), and Fred Haise Jr. (Lunar Module Pilot, LMP). The mission was quite routine and everything went to plan, except when at 56:54:53, the astronauts heard a ”pretty large bang” and experienced fluctuations in electrical power and control thrusters. This sets a series of events in motion. With oxygen levels depleting fast, the astronauts not only faced a risk of running out of oxygen to breathe. Therefore, they decided to abort the mission and come back to earth. This indeed changed their communications and interactions during the mission, as both the astronauts and mission control had to solve unexpected and urgent problems in order to bring the crew home alive.
The data come from the Apollo 13’s voice loops transcripts, obtained
from http://apollo13realtime.org/ and https://history.nasa.gov/afj/ap13fj/07day3-before-the-storm.html;
the data include the Flight directors’ voice loop and the air-ground’s
voice loop. Flight directors (Houston’s Mission Control Center) were
located in Houston and the crew (astronauts) were connected to this
control center via Capsule Communicator (CAPCOM). The Apollo 13 data is
an ideal benchmark data to study communication/interaction pattern. In
this practical we use a part of the data, one hour before incident till
one hour after that. The eventlist is stored in an object called
PartOfApollo_13
and contain four columns: time, sender,
receiver and message. Note that we know precisely when these calls were
made. Therefore, we apply REM considering that the the exact event times
are known. That should be notified that the number of actors is 19.
The data was extracted from the Academic Twitter API in two steps. In the first step, all users being mentioned using the hashtags ‘#ic2s2’ or ‘#’netsciXXXX’ (XXXX = 2010, 2011, … 2022) were extracted. In the second step, all mentions of those users were extracted, up to 800 tweets. Finally, the core of the network was extracted by keeping users with an in-degree and out-degree over K=150 mentions. As there were some overlaps in the time, we modified the data in order to be usable for the REM model. The source or sender is a person tweeting and the target or receiver is the person mentioned. Note that the time was date of tweet, which was converted to day. In this REH data the number of actors is 39.
The classroom dataset includes timed interactions between students and instructors within a high school classroom (McFarland, 2001; Bender-deMoll and McFarland, 2006). Sender and receiver events (n=691) were recorded between 20 actors (18 students and 2 teachers) along with the time of the events in increments of minutes. Note that the data employed here were modified slightly to increase the amount of time occurring between very closely recorded events, ensuring no simultaneity of events as assumed by the relational event framework. Two actor-level covariates are also at hand in the dataset used here: whether the actor was a teacher and whether the actor was female.
This data contains information about the 9/11 terrorist attacks at the WTC in New York City in 2001. That should be notified that the radio communication was essential in coordinating activities during the crisis. Butts et al.(2007) coded radio communication events between officers responding to 9/11 from transcripts of communications recorded during the event. It is important to note that the WTC radio data was coded from transcripts that lacked detailed timing information; we do not therefore know precisely when these calls were made. But we know the order of events. Since the case of ordinal timing is somewhat simpler than that of exact timing, we consider the WTC data first in this practical. We will illustrate ordinal time REMs using the 481 communication events from 37 named communicants in this data set.
The eventlist is stored in an object called
WTCPoliceCalls
.
Question 1: Check the data using head
and
tail
function. How many events does the data contain? How
many actors? Without testing anything, can you see any trend? If so,
which statistics can capture that trend?
# Check the data.
head(WTCPoliceCalls)
## number source recipient
## 1 1 16 32
## 2 2 32 16
## 3 3 16 32
## 4 4 16 32
## 5 5 11 32
## 6 6 11 32
You can see the reciprocation and recency pattern in the first few events: responding officer 16 called officer 32 in the first event, officer 32 then called 16 back in the second, which might be characterized as a local reciprocity effect or \(AB \rightarrow BA\) participation shift. This was followed by 32 being the target of the next four calls, which is probably due to either a role of 32 such as a coordinator or perhaps the presence of a recency mechanism.
Question 2: Use the included sna
function
as.sociomatrix.edgelist()
to convert the eventlist into a
valued sociomatrix. For the definition of sociomatrix see previous
section.
Hint: as.sociomatrix.edgelist(data, number of actors)
.
WTCPoliceNet <- as.sociomatrix.eventlist(WTCPoliceCalls ,37)
Question 3: Once you have the sociomatrix object, plot the
aggregated network using gplot()
.
Hint: gplot(created sociomatrix, the arguments such as the size of
nodes, width of edges, colors of nodes, etc.)
gplot
creates a node-and-edge visualization of a network.
It can accept an adjacency matrix (N × N), an array of adjacency
matrices (m × N × N), network objects, and other forms of network data
(check sna
library to see what else is possible).
gplot
has many arguments that you can tweak your network
model visualization, for example:
vertex.col
: set the node color.edge.col
: color for edge border and fill.vertex.sides
: number of sides for node polygon. ex)
triangle = 3, square = 4 (default = 8).label.bg
: color for the background of node.gplot
to find other features.gplot(WTCPoliceNet, edge.lwd=WTCPoliceNet^0.75, arrowhead.cex=
log(as.edgelist.sna(WTCPoliceNet)[,3])+.25,
vertex.col= ifelse(WTCPoliceIsICR ,"black","gray"),
vertex.cex=1.25, vertex.sides=ifelse(WTCPoliceIsICR ,4,100))
The plot above is the time-aggregated WTC Police communication network.
The question is whether the propensity of individuals to send and
receive calls depends on whether they occupy institutionalized
coordinator roles (ICR) or not. WTCPoliceIsICR
is a vector
that shows the ICR.
Question 4: Use rem.dyad()
by passing
appropriate arguments.
Hint:
rem.dyad(data..., n = number of actors, effects = c("CovInt"), covar = list(CovInt = WTCPoliceIsICR), hessian = ...)
.
Please Check the description of arguments (?rem.dyad
).
wtcfit1 <- rem.dyad(WTCPoliceCalls,n=37,effects=c("CovInt"),
covar=list(CovInt=WTCPoliceIsICR),hessian=TRUE)
## Prepping edgelist.
## Checking/prepping covariates.
## Computing preliminary statistics
## Fitting model
## Obtaining goodness-of-fit statistics
Once you have fit the model, you can summarize the model fit using
summary()
.
**Question 5: Check the model summary. Then See how much were not very
well predicted by your model using
mean(apply(your model$predicted.match,1,all)).
predicted.matchprovides a two-column matrix of true and false corresponding to the sender and receiver columns in the main data frame. It shows whether the model could predict the sender and receiver truly.** Hint:
summary(your
model name)`.
summary(wtcfit1)
## Relational Event Model (Ordinal Likelihood)
##
## Estimate Std.Err Z value Pr(>|z|)
## CovInt.1 2.104464 0.069817 30.142 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Null deviance: 6921.048 on 481 degrees of freedom
## Residual deviance: 6193.998 on 480 degrees of freedom
## Chi-square: 727.0499 on 1 degrees of freedom, asymptotic p-value 0
## AIC: 6195.998 AICC: 6196.007 BIC: 6200.174
mean(apply(wtcfit1$predicted.match,1,all))
## [1] 0
The output gives us the covariate effect, some uncertainty, and goodness-of-fit information. The format is like the output for a regression model summary, but note that the coefficients should be interpreted considering the relational event framework. In particular, the ICR role coefficient is the logged multiplier for the hazard of an event involving an ICR versus a non-ICR event ( \(e^{\lambda_1}\) ). This effect is cumulative: an event in which one actor in an ICR calls another actor in an ICR gets twice the log increment (\(e^{2\lambda_1}\) ).
Question 6: Use exp()
to check the value of
\(e^{\lambda_1}\) and \(e^{2\lambda_1}\).
Hint: You can extract the coeffcient value by using $
sign
(e.g., yourmodel$coef
).
## Relative hazard for a non-ICR/ICR
exp(wtcfit1$coef)
## CovInt.1
## 8.202709
## Relative hazard for an ICR/ICR effect)
exp(2*wtcfit1$coef)
## CovInt.1
## 67.28443
Next, we evaluate whether it is worth treating these effects
separately with respect to ICR status.
To do so, we enter the ICR covariate as a sender and receiver covariate,
respectively.
**Question 7: Fit another model adding the covariates and check the
model summary as you did above. See how much were not very well
predicted by your model using
mean(apply(your model$predicted.match,1,all)) and compare it with previous model prediction** Hint: It is the same as what we did for Q4. But here we should consider the effect for the sender and for the receiver separately. The structure is the same but we need to select "CovSnd",and " CovRec". Check the description of arguments to see what you need to specify for
effects`.
wtcfit2<-rem.dyad(WTCPoliceCalls,n=37,effects=c("CovSnd","CovRec"),
covar=list(CovSnd=WTCPoliceIsICR ,CovRec= WTCPoliceIsICR),hessian=TRUE)
## Prepping edgelist.
## Checking/prepping covariates.
## Computing preliminary statistics
## Fitting model
## Obtaining goodness-of-fit statistics
summary(wtcfit2) # check the model output
## Relational Event Model (Ordinal Likelihood)
##
## Estimate Std.Err Z value Pr(>|z|)
## CovSnd.1 1.979175 0.095745 20.671 < 2.2e-16 ***
## CovRec.1 2.225720 0.092862 23.968 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Null deviance: 6921.048 on 481 degrees of freedom
## Residual deviance: 6190.175 on 479 degrees of freedom
## Chi-square: 730.8731 on 2 degrees of freedom, asymptotic p-value 0
## AIC: 6194.175 AICC: 6194.2 BIC: 6202.527
mean(apply(wtcfit2$predicted.match,1,all))-mean(apply(wtcfit1$predicted.match,1,all))
## [1] 0
Question 8: Now, evaluate which model is preferred by BIC
(lower is better) and explain what the BIC tells us.
Hint: You can extract BIC value from each model by using $
sign (e.g., yourmodel$BIC
).
wtcfit1$BIC - wtcfit2$BIC
## [1] -2.352663
Add the reciprocity term: \(AB \rightarrow BA\). For the definition of reciprocity, see the slides. In particular, the AB-BA shift, which captures the tendency for B to call A, given that A has just called B, is likely at play in radio communication.
Question 9: Fit another model adding the reciprocity term and check the model summary. See how much were not very well predicted by your model using `mean(apply(your model$predicted.match,1,all)) and compare it with previous model prediction.
Hint: Add “PSAB-BA” as another statistic to the best model. Check the
description of arguments to see what you need to specify for
effects
.
wtcfit3 <- rem.dyad(WTCPoliceCalls,n=37,effects=c("CovInt","PSAB-BA"),
covar=list(CovInt=WTCPoliceIsICR),hessian=TRUE)
## Prepping edgelist.
## Checking/prepping covariates.
## Computing preliminary statistics
## Fitting model
## Obtaining goodness-of-fit statistics
summary(wtcfit3)
## Relational Event Model (Ordinal Likelihood)
##
## Estimate Std.Err Z value Pr(>|z|)
## CovInt.1 1.60405 0.11500 13.949 < 2.2e-16 ***
## PSAB-BA 7.32695 0.10552 69.436 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Null deviance: 6921.048 on 481 degrees of freedom
## Residual deviance: 2619.115 on 479 degrees of freedom
## Chi-square: 4301.933 on 2 degrees of freedom, asymptotic p-value 0
## AIC: 2623.115 AICC: 2623.14 BIC: 2631.467
mean(apply(wtcfit3$predicted.match,1,all))-mean(apply(wtcfit2$predicted.match,1,all))
## [1] 0.6839917
Question 10: Again, evaluate which model is preferred by BIC.
# Compare the BIC of this model with the first model.
wtcfit1$BIC - wtcfit3$BIC
## [1] 3568.707
Note that the lower BIC the better the model. It appears that there
is a very strong reciprocity effect. Doesn’t it?
We may expect that the current participants in a communication are
likely to initiate the next call and that one’s most recent
communications may be the most likely to be returned.
These processes can be captured with the participation shifts for dyadic
turn receiving/continuing and recency effects, respectively.
**Question 11: Fit another model adding the p-shift effects to the
previous model and check the model summary. See how much were not very
well predicted by your model using
mean(apply(your model$predicted.match,1,all)) and compare it with previous model. prediction** Hint: Add "PSAB-BY" and "PSAB-AY" to the best model. For more information check the description of arguments to see what you need to specify for
effects`.
## Adding p-shift effects to model 3
wtcfit4<-rem.dyad(WTCPoliceCalls,n=37,effects=c("CovInt","PSAB-BA","PSAB-BY","PSAB-AY"),
covar=list(CovInt=WTCPoliceIsICR),hessian=TRUE)
## Prepping edgelist.
## Checking/prepping covariates.
## Computing preliminary statistics
## Fitting model
## Obtaining goodness-of-fit statistics
summary(wtcfit4)
## Relational Event Model (Ordinal Likelihood)
##
## Estimate Std.Err Z value Pr(>|z|)
## CovInt.1 1.54283 0.11818 13.0549 < 2.2e-16 ***
## PSAB-BA 7.49956 0.11418 65.6830 < 2.2e-16 ***
## PSAB-BY 1.25942 0.25131 5.0115 5.401e-07 ***
## PSAB-AY 0.87217 0.30611 2.8492 0.004384 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Null deviance: 6921.048 on 481 degrees of freedom
## Residual deviance: 2595.135 on 477 degrees of freedom
## Chi-square: 4325.913 on 4 degrees of freedom, asymptotic p-value 0
## AIC: 2603.135 AICC: 2603.219 BIC: 2619.839
mean(apply(wtcfit4$predicted.match,1,all))-mean(apply(wtcfit3$predicted.match,1,all))
## [1] 0
Question 12: Evaluate which model is preferred.
# Compare the BIC of this model with the previous model (third model).
wtcfit3$BIC - wtcfit4$BIC
## [1] 11.62806
Question 13: Lastly, fit a model adding the recency effects
to the previous model and check the model summary. See how much were not
very well predicted by your model using `mean(apply(your
model$predicted.match,1,all)) and compare it with previous model.
prediction
Hint: Read the definition of “RRecSnd”, and “RSndSnd”. Are they the
statistics that we need for this purpose?
## Adding recency effects to model 4
wtcfit5<-rem.dyad(WTCPoliceCalls,n=37,
effects=c("CovInt","PSAB-BA","PSAB-BY","PSAB-AY","RRecSnd","RSndSnd"),
covar=list(CovInt=WTCPoliceIsICR),hessian=TRUE)
## Prepping edgelist.
## Checking/prepping covariates.
## Computing preliminary statistics
## Fitting model
## Obtaining goodness-of-fit statistics
summary(wtcfit5)
## Relational Event Model (Ordinal Likelihood)
##
## Estimate Std.Err Z value Pr(>|z|)
## RRecSnd 2.38496 0.27447 8.6892 < 2.2e-16 ***
## RSndSnd 1.34622 0.22307 6.0349 1.590e-09 ***
## CovInt.1 1.07058 0.14244 7.5160 5.640e-14 ***
## PSAB-BA 4.88714 0.15293 31.9569 < 2.2e-16 ***
## PSAB-BY 1.67938 0.26116 6.4304 1.273e-10 ***
## PSAB-AY 1.39016 0.31057 4.4762 7.599e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Null deviance: 6921.048 on 481 degrees of freedom
## Residual deviance: 2307.413 on 475 degrees of freedom
## Chi-square: 4613.635 on 6 degrees of freedom, asymptotic p-value 0
## AIC: 2319.413 AICC: 2319.591 BIC: 2344.469
mean(apply(wtcfit5$predicted.match,1,all))-mean(apply(wtcfit4$predicted.match,1,all))
## [1] 0
Question 14: Evaluate which model is preferred.
# Compare the BIC of this model with the previous model (fourth model).
wtcfit4$BIC - wtcfit5$BIC
## [1] 275.3701
The results indicate that turn-receiving, turn-continuing, and
recency effects are all at play in the relational event process.
Question 15: Can you think of adding more
statistics?
Please add a few more and see how the BIC changes. To this end, check
the definition of statistics precisely.
# Try it out yourself.
Question 16: As there are a lot of statistics that we can use, can you think of some ways for statistics selection?
# Try it out yourself.
Model adequacy is a crucial consideration: even though the model fit
is sufficiently good, you need to think of whether this model is good
enough for our purposes. For example, the ability of the relational
event model to predict the next event in the sequence, given those that
have come before.
Note that we do not discuss this further in this practical but
interested students can check Butts 2008.
We are going to use REMs for event histories with exact timing
information.
Let’s start by examining the raw temporal data and the time-aggregated
network.
In this case, event time is given in increments of minutes from the
onset of observation.
The edgelist is stored in an object called Class
.
Question 17: Check the data again using head()
. Can
you see any trend?
## The edgelist is stored in an object called `Class`.
head(Class)
## StartTime FromId ToId
## 1 0.135 14 12
## 2 0.270 12 14
## 3 0.405 18 12
## 4 0.540 12 18
## 5 0.675 1 12
## 6 0.810 12 1
Question 18: First, convert the eventlist into a valued
sociomatrix using as.sociomatrix.edgelist()
as you did
above.
Hint: as.sociomatrix.edgelist(data, number of actors)
.
ClassNet <- as.sociomatrix.eventlist(Class ,20)
Question 19: Plot the network using
gplot()
.
Here we color the female nodes black, the male nodes gray, represent
teachers as square-shaped nodes and students as triangle-shaped nodes.
Edges between nodes are likewise scaled proportional to the number of
communication events transpiring between actors.
Hint: Please check the hint of Q3 and consult the help file of
gplot
.
gplot(ClassNet ,vertex.col=ifelse(ClassIsFemale ,"black","gray"),
vertex.sides=3+ClassIsTeacher ,vertex.cex=2, edge.lwd= ClassNet ^0.75)
**Question 20: Fit a simple intercept model, containing only a vector
of 1s as an actor-level sending effect. And check the model summary. See
how much were not very well predicted by your model using
mean(apply(your model$predicted.match,1,all)) ** Hint: This vector is saved as
ClassIntercept, which we can pass to the respective covariate arguments in
rem.dyad(). Also note that we do not want to discard timing information (
ordinal=FALSE`).
classfit1<-rem.dyad(Class,n=20,effects=c("CovSnd"),
covar= list(CovSnd=ClassIntercept), ordinal=FALSE,hessian=TRUE)
## Prepping edgelist.
## Checking/prepping covariates.
## Computing preliminary statistics
## Fitting model
## Obtaining goodness-of-fit statistics
summary(classfit1)
## Relational Event Model (Temporal Likelihood)
##
## Estimate Std.Err Z value Pr(>|z|)
## CovSnd.1 -3.332287 0.038042 -87.596 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Null deviance: 5987.221 on 691 degrees of freedom
## Residual deviance: 5987.221 on 691 degrees of freedom
## Chi-square: -5.638867e-11 on 0 degrees of freedom, asymptotic p-value 1
## AIC: 5989.221 AICC: 5989.227 BIC: 5993.759
mean(apply(classfit1$predicted.match,1,all))
## [1] 0
The model does not fit any better than the null, because it is equivalent to the null model (as indicated by the absence of difference between the null and residual deviance).
Question 21: Now fit a more interesting covariate model that considers gender(male/female) and role(teacher/student) for senders and receivers. And evaluate whether there is any improvement over the intercept-only model by BIC. See how much were not very well predicted by your model using `mean(apply(your model$predicted.match,1,all)) and compare it with previous model prediction
classfit2<-rem.dyad(Class,n=20,effects=c("CovSnd","CovRec"),
covar=list(CovSnd=cbind(ClassIntercept ,ClassIsTeacher ,
ClassIsFemale), CovRec=cbind(ClassIsTeacher ,ClassIsFemale))
,ordinal=FALSE ,hessian=TRUE)
## Prepping edgelist.
## Checking/prepping covariates.
## Computing preliminary statistics
## Fitting model
## Obtaining goodness-of-fit statistics
summary(classfit2)
## Relational Event Model (Temporal Likelihood)
##
## Estimate Std.Err Z value Pr(>|z|)
## CovSnd.1 -3.834265 0.078843 -48.6317 < 2e-16 ***
## CovSnd.2 1.672575 0.091680 18.2436 < 2e-16 ***
## CovSnd.3 0.123916 0.094932 1.3053 0.19179
## CovRec.1 0.373729 0.127030 2.9421 0.00326 **
## CovRec.2 0.165761 0.080896 2.0491 0.04046 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Null deviance: 5987.221 on 691 degrees of freedom
## Residual deviance: 5652.318 on 687 degrees of freedom
## Chi-square: 334.9034 on 4 degrees of freedom, asymptotic p-value 0
## AIC: 5662.318 AICC: 5662.405 BIC: 5685.008
# Compare the BIC of this model with the previous model.
classfit1$BIC - classfit2$BIC
## [1] 308.7508
mean(apply(classfit2$predicted.match,1,all))-mean(apply(classfit1$predicted.match,1,all))
## [1] 0.01013025
A good improvement over the null model. But note that gender does not appear to be predictive of sending communication. A better model may be one without that specific term included.
Question 22: Fit another model excluding the non-significant
term and again compare to the previous model by BIC and
prediction.
Hint: Based on the above outputs, which statistics should be
included/excluded?
classfit3<-rem.dyad(Class,n=20,effects=c("CovSnd","CovRec"),
covar=list(CovSnd=cbind(ClassIntercept ,ClassIsTeacher),
CovRec=cbind(ClassIsTeacher,ClassIsFemale)),
ordinal=FALSE,hessian=TRUE)
## Prepping edgelist.
## Checking/prepping covariates.
## Computing preliminary statistics
## Fitting model
## Obtaining goodness-of-fit statistics
summary(classfit3)
## Relational Event Model (Temporal Likelihood)
##
## Estimate Std.Err Z value Pr(>|z|)
## CovSnd.1 -3.775224 0.063622 -59.3379 < 2.2e-16 ***
## CovSnd.2 1.615762 0.079933 20.2139 < 2.2e-16 ***
## CovRec.1 0.371758 0.127019 2.9268 0.003425 **
## CovRec.2 0.161157 0.080815 1.9941 0.046137 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Null deviance: 5987.221 on 691 degrees of freedom
## Residual deviance: 5654.016 on 688 degrees of freedom
## Chi-square: 333.2049 on 3 degrees of freedom, asymptotic p-value 0
## AIC: 5662.016 AICC: 5662.074 BIC: 5680.169
classfit2$BIC - classfit3$BIC
## [1] 4.839661
mean(apply(classfit3$predicted.match,1,all))-mean(apply(classfit2$predicted.match,1,all))
## [1] 0
Question 23: Is there any improvement?
We are again going to use REMs for event histories with exact timing
information.
Let’s consider Apollo 13 mission data. Read the description of the data
at the beginning of this practical and also check the aforementioned
websites for more information. In this case, event time is given in
increments of seconds from the onset of observation.
In this data the actors are as follows:
AFD: Assistant Flight Director from Flight directors (1)
CAPCOM: Capsule Communicator from Flight directors (2)
CONTROL: Control Officer from Flight directors (3)
EECOM: Electrical, Environmental and Consumables Manager from Flight directors (4)
All : Ground control team (without flight directores) (5)
FDO : Flight dynamics officer (FDO or FIDO) (6)
FLIGHT: Flight Director from Flight directors (7)
GNC: The Guidance, Navigation, and Controls Systems Engineer from Flight directors (8)
GUIDO: Guidance Officer from Flight directors (9)
INCO: Integrated Communications Officer from Flight directors (10)
NETWORK: Network of ground stations from Flight directors (11)
TELMU: Telemetry, Electrical, and EVA Mobility Unit Officer from Flight directors (12)
RECOVERY: Recovery Supervisor from Flight directorsc (13)
PROCEDURES: Organization & Procedures Officer from Flight directors (14)
FAO: Flight activities officer from Flight directors (15)
RETRO: Retrofire Officer from Flight directors (16)
CDR: Commander James A. Lovell Jr. crew (astronauts) (17)
CMP: Command Module Pilot John (Jack) L. Swigert Jr. crew (astronauts) (18)
LMP: Lunar module pilot Fred W. Haise Jr. crew (astronauts) (19)
Question 26: First look at the data, plot the network and start with fitting a simple model. Next, add the statistics of interest to the model and see the performance of the new model.
Apollo_mission1<-rem.dyad(PartOfApollo_13,n=19,effects=c("PSAB-BA"), ordinal=TRUE,hessian= TRUE)
## Prepping edgelist.
## Checking/prepping covariates.
## Computing preliminary statistics
## Fitting model
## Obtaining goodness-of-fit statistics
summary(Apollo_mission1)
## Relational Event Model (Ordinal Likelihood)
##
## Estimate Std.Err Z value Pr(>|z|)
## PSAB-BA 6.453583 0.033668 191.69 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Null deviance: 45301.47 on 3882 degrees of freedom
## Residual deviance: 20850.3 on 3881 degrees of freedom
## Chi-square: 24451.17 on 1 degrees of freedom, asymptotic p-value 0
## AIC: 20852.3 AICC: 20852.3 BIC: 20858.56
Apollo_mission2<-rem.dyad(PartOfApollo_13,n=19,effects=c("PSAB-BA","RRecSnd"), ordinal=TRUE,hessian= TRUE)
## Prepping edgelist.
## Checking/prepping covariates.
## Computing preliminary statistics
## Fitting model
## Obtaining goodness-of-fit statistics
summary(Apollo_mission2)
## Relational Event Model (Ordinal Likelihood)
##
## Estimate Std.Err Z value Pr(>|z|)
## RRecSnd 5.065585 0.067967 74.530 < 2.2e-16 ***
## PSAB-BA 2.953172 0.038625 76.458 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Null deviance: 45301.47 on 3882 degrees of freedom
## Residual deviance: 14706.31 on 3880 degrees of freedom
## Chi-square: 30595.16 on 2 degrees of freedom, asymptotic p-value 0
## AIC: 14710.31 AICC: 14710.32 BIC: 14722.84
Apollo_mission2$BIC- Apollo_mission1$BIC
## [1] -6135.723
Apollo_mission3<-rem.dyad(PartOfApollo_13,n=19,effects=c("PSAB-BA","RRecSnd","ITPSnd"), ordinal=TRUE,hessian= TRUE)
## Prepping edgelist.
## Checking/prepping covariates.
## Computing preliminary statistics
## Fitting model
## Obtaining goodness-of-fit statistics
summary(Apollo_mission3)
## Relational Event Model (Ordinal Likelihood)
##
## Estimate Std.Err Z value Pr(>|z|)
## RRecSnd 5.07271311 0.06838339 74.1805 <2e-16 ***
## ITPSnd -0.00038060 0.00043885 -0.8673 0.3858
## PSAB-BA 2.95169869 0.03863341 76.4027 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Null deviance: 45301.47 on 3882 degrees of freedom
## Residual deviance: 14705.64 on 3879 degrees of freedom
## Chi-square: 30595.83 on 3 degrees of freedom, asymptotic p-value 0
## AIC: 14711.64 AICC: 14711.65 BIC: 14730.43
Apollo_mission3$BIC- Apollo_mission2$BIC
## [1] 7.592294
ApolloNet <- as.sociomatrix.eventlist(PartOfApollo_13 ,19)
gplot(ApolloNet, edge.lwd=ApolloNet^0.75, arrowhead.cex=
log(as.edgelist.sna(ApolloNet)[,3])+.25,vertex.col="gray",
vertex.cex=1.25)
Question 27: First, look at the data, then plot the network. Can you see any trends in the network? Start with fitting a REM model and add the statistics of interest sequentially. Can you see any improvement based on the BIC?
Hint: Consider some of these statistics:“PSAB-BA”, “ISPSnd”, “PSAB-BA”, “PSAB-XB”, “NIDSnd”, “NIDRec”, and “NODSnd”.
head(Twitter_data_rem3)
## time_day source target
## 1 0.00 8 1
## 2 121.03 28 2
## 3 199.08 28 2
## 4 266.95 4 3
## 5 573.59 22 5
## 6 574.49 25 5
twitter_NET <- as.sociomatrix.eventlist(Twitter_data_rem3 ,39)
gplot(twitter_NET, edge.lwd=twitter_NET^0.75, arrowhead.cex=
log(as.edgelist.sna(twitter_NET)[,3])+.25,
vertex.col= "black",
vertex.cex=1.25)
twitter1<- rem.dyad(Twitter_data_rem3,n=39, effects = c("PSAB-BA", "PSAB-BY"), ordinal = FALSE,hessian = TRUE)
## Prepping edgelist.
## Checking/prepping covariates.
## Computing preliminary statistics
## Fitting model
## Obtaining goodness-of-fit statistics
summary(twitter1)
## Relational Event Model (Temporal Likelihood)
##
## Estimate Std.Err Z value Pr(>|z|)
## PSAB-BA -6.76516 0.57582 -11.749 < 2.2e-16 ***
## PSAB-BY -8.28376 0.20387 -40.633 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Null deviance: 25024.13 on 1405 degrees of freedom
## Residual deviance: 7425381 on 1404 degrees of freedom
## Chi-square: -7400357 on 1 degrees of freedom, asymptotic p-value 1
## AIC: 7425385 AICC: 7425385 BIC: 7425396