Scholars’ Mine
Scholars’ Mine
Masters Theses
Student Theses and Dissertations
Fall 2018
Classification of EEG signals of user states in gaming using
Classification of EEG signals of user states in gaming using
machine learning
machine learning
Chandana Mallapragada
Follow this and additional works at: https://scholarsmine.mst.edu/masters_theses
Part of the Databases and Information Systems Commons, and the Technology and Innovation
Commons
Department:
Department:
Recommended Citation
Recommended Citation
Mallapragada, Chandana, “Classification of EEG signals of user states in gaming using machine learning”
(2018). Masters Theses. 7831.
https://scholarsmine.mst.edu/masters_theses/7831
This thesis is brought to you by Scholars’ Mine, a service of the Missouri S&T Library and Learning Resources. This
work is protected by U. S. Copyright Law. Unauthorized use including reproduction for redistribution requires the
permission of the copyright holder. For more information, please contact scholarsmine@mst.edu.
CLASSIFICATION OF EEG SIGNALS OF USER STATES IN GAMING USING
MACHINE LEARNING
by
CHANDANA MALLAPRAGADA
A THESIS
Presented to the Faculty of the Graduate School of the
MISSOURI UNIVERSITY OF SCIENCE AND TECHNOLOGY
In Partial Fulfillment of the Requirements for the Degree
MASTER OF SCIENCE IN INFORMATION SCIENCE & TECHNOLOGY
2018
Approved by
Dr. Fiona Fui-Hoon Nah, Advisor
Dr. Keng Siau
Dr. Richard Hall
Dr. Langtao Chen
iii
ABSTRACT
In this research, brain activity of user states was analyzed using machine learning
algorithms. When a user interacts with a computer-based system including playing
computer games like Tetris, he or she may experience user states such as boredom, flow,
and anxiety. The purpose of this research is to apply machine learning models to
Electroencephalogram (EEG) signals of three user states – boredom, flow and anxiety –
to identify and classify the EEG correlates for these user states. We focus on three
research questions: (i) How well do machine learning models like support vector
machine, random forests, multinomial logistic regression, and k-nearest neighbor classify
the three user states – Boredom, Flow, and Anxiety? (ii) Can we distinguish the flow
state from other user states using machine learning models? (iii) What are the essential
components of EEG signals for classifying the three user states? To extract the critical
components of EEG signals, a feature selection method known as minimum redundancy
and maximum relevance method was implemented. An average accuracy of 85 % is
achieved for classifying the three user states by using the support vector machine
classifier.
Keywords: Neural Correlates, Flow, Electroencephalogram, Machine Learning, Support
Vector Machine, Random Forests, Multinomial Logistic Regression, k-Nearest
Neighbor, Minimum Redundancy and Maximum Relevance
iv
ACKNOWLEDGMENTS
First and foremost, I gratefully acknowledge the generosity of Dr. Fiona Nah for
providing me the opportunity to work under her as a thesis student. It was her constant
mentorship that made me succeed academically and helped me build strong professional
relationships with my professors. Her positive influence and constant support are the
reasons that inspired me to learn and explore the data science domain and complete my
research work. Also, I wish to convey my gratitude to Dr. Langtao Chen, for his
patience, constant support, and valuable feedback on my research. I was fortunate
enough to work under Dr. Nah and Dr. Chen, who immensely helped in gauging my
research in the right direction with their knowledge, without which this thesis would not
be possible. Also, I was able to present my research work at the 2017 Midwest
Association for Information Systems conference, a great platform for a graduate student
like me to broaden my perspective on research, which happened only with the support
of Dr. Nah and Dr. Chen.
I am also grateful to Dr. Keng Siau and Dr. Richard Hall, my committee
members, for their encouragement, insightful comments, and questions.
Finally, I thank my fellow thesis student, Tejaswini Yelamanchili, for assisting
me throughout my research work. I also appreciate the consistent morale and emotional
support of my family and friends.
v
TABLE OF CONTENTS
Page
ABSTRACT
………………………………………………………………………………………………………………… iii
ACKNOWLEDGMENTS …………………………………………………………………………………………… iv
LIST OF ILLUSTRATIONS.
…………………………………………………………………………………….. vii
LIST OF TABLES…………………………………………………………………………….viii
SECTION
1. INTRODUCTION …………………………………………………………………………………………………
1
2. LITERATURE REVIEW ………………………………………………………………………………………
3
2.1. USER STATES
………………………………………………………………………………………………
3
2.2. ELECTROENCEPHALOGRAM (EEG)…………………………………………..4
2.3. RELATED WORK …………………………………………………………………..5
3. RESEARCH METHODOLOGY
………………………………………………………………….. 12
3.1. EXPERIMENTAL DESIGN
……………………………………………………………………….. 12
3.2. RESEARCH PROCEDURE ……………………………………………………………………….. 12
3.3. MEASUREMENT
………………………………………………………………………………………. 14
3.4. CLASSIFICATION USING MACHINE LEARNING
………………………………… 15
3.4.1. Support Vector Machine………………………………………………………16
3.4.2. Random Forests………………………………………………………………16
3.4.3. k-Nearest Neighbors…………………………………………………………16
3.4.4. Statistics for Evaluating Models ………………………………………………………… 17
vi
4. DATA ANALYSIS AND RESULTS
…………………………………………………………………. 18
4.1. DATA PRE-PROCESSING
……………………………………………………………………….. 19
4.2. DATA ANALYSIS ……………………………………………………………………………………. 21
4.3. RESULTS …………………………………………………………………………………………………… 23
5. DISCUSSION OF RESULTS
…………………………………………………………………………….. 30
6. LIMITATIONS AND FUTURE RESEARCH
……………………………………………………. 33
7. CONCLUSION
………………………………………………………………………………………………….. 34
BIBLIOGRAPHY ………………………………………………………………………………………………………. 36
VITA……………………………………………………………………………………………..40
vii
LIST OF ILLUSTRATIONS
Figure Page
3.1. 64-Channel Cognionics EEG Headset ………………………………………………………………… ..15
4.1. Overview of Data Analysis Process ……………………………………………………………………. ..18
4.2. Model Accuracies for Important EEG Components using MRMR-Method…………27
4.3. TOP 30 EEG Channels using MRMR-Method…………………………………………29
5.1. Most Important Brain Regions from MRMR-Method…………………………………31
viii
LIST OF TABLES
Table
Page
2.1. Research on Application of Machine Learning to Classify EEG Signals……………..9
3.1. List of Electrodes in EEG Headset and Positions in the Human Scalp……………….14
4.1. Brainwaves with Wavelengths……………………………………………………………21
4.2. Model Performance for Every Band Combination……………………………………..24
4.3. Comparison of Models
………………………………………………………………………………….. 25
4.4. Confusion Matrix for Flow vs Non-Flow ……………………………………………………………… 26
4.5. Top 30 EEG Channels using MRMR (Ranked by Variable Importance)……………28
1. INTRODUCTION
User experience (UX) is a research area in Human-Computer Interaction (HCI)
that provides a comprehensive view of a user’s interaction with an application, product
or system (Tondello, 2016). Today, games are a focal point of user experience research
in human-computer interaction (Nacke, 2017). Gaming is an engaging and accessible
form of entertainment activities (Hartmann and Klimmt, 2006). The evaluation of user
experience in gaming includes a variety of states such as flow, engagement,
involvement, fun, immersion, and presence. When there is a balance between a user’s
skill and the difficulty level of a game, an optimal experience known as the flow state
arises (Csikszentmihalyi, 1990). In contrast, too much challenge can lead to anxiety,
and too low a challenge can result in boredom (Chanel et al., 2008). This research
focuses on three user states – Flow, Boredom, and Anxiety – by examining their neural
correlates using electroencephalogram (EEG). EEG refers to electrical activity in the
brain that arises from electrical impulses that facilitate communication between the
brain cells (Muller et al., 2015).
The primary objective of this research is to classify EEG signals into flow,
boredom, and anxiety states by applying machine learning. Machine learning, a subset of
artificial intelligence, is the implementation of quantitative techniques to learn from
existing data to make predictions (Naqa and Murphy, 2015). It involves a process of
creating, testing, and validating models to obtain reliable outcomes and trends in the data.
Among the various kinds of machine learning models available, we are interested
in four supervised machine learning models – support vector machine (SVM), random
2
forests (RF), multinomial logistic regression (mlogit), and k-nearest neighbor (k-NN).
The following are the statistics used to evaluate the machine learning models and
compare their results – accuracy, kappa, and area under the receiver operating
characteristic curve (AUC). Further, we identified the essential components of EEG
signals for the user state classification task with the help of a feature selection method
called minimum redundancy and maximum relevance (MRMR). The aim of this research
is to identify machine learning models that perform well in classifying user states into
flow, boredom, and anxiety.
Given the importance of applying machine learning techniques to determine user
states (i.e., flow, boredom, and anxiety) in the HCI context, we put forth our research
questions as follows:
Research Question 1: How well do machine learning models like SVM, RF,
mlogit, and k-NN classify the three user states – Boredom, Flow, and Anxiety?
Research Question 2: Can we distinguish the flow state from other user states
using machine learning models?
Research Question 3: What are the essential components of EEG signals for
classifying the three user states?
This thesis is organized as follows. Section 2 provides a review of the literature.
Section 3 covers the research methodology. Section 4 details the process of data
analysis and the results obtained. Section 5 discusses the results. Section 6 highlights
the limitations and future research, and Section 7 concludes the thesis.
3
2. LITERATURE REVIEW
2.1. USER STATES
The study of interaction between human and computer has gained attention,
particularly in the field of gaming. Traditionally, modeling of players’ engagement in
gaming was qualitative and mostly based on psychology (Plotnikov et al., 2012).
Among these traditional ways, two major lines were identified: 1) Malone and Lepper
(1987) determined players’ engagement based on three intrinsic qualitative factors:
challenge, fantasy and curiosity, and 2) Csikszentmihalyi (1990) assessed players’
enjoyment in gaming by incorporating flow in computer games. Three key user states
were identified by Csikszentmihalyi, and they are boredom, flow, and anxiety
(Yelamanchili et al., 2017). Among the above-mentioned user states, flow is the focal
point in human-computer interaction research that provides an optimal experience
where an individual is totally absorbed in a task and is unaware of his/her surroundings
or passing of time (Csikszentmihalyi, 1990; Yelamanchili et al., 2017).
In Csikszentmihalyi’s ‘Flow theory’, the flow state is conceptualized into nine
components: challenging activity that require skills, merging of action and awareness,
well-defined goals, direct and instantaneous feedback, focus on the task at hand, loss of
self-consciousness, sense of control, distorted sense of time, and intrinsic interest
(Csikszentmihalyi, 1990). Flow state emerges when there is a balance between the skill
of an individual and the challenge posed by the task (Csikszentmihalyi 1990; Lee et al.,
2015; Nah et al., 2010). Boredom is a user state that arises when the skill level of a user
is higher than the challenge level of the given task (Csikszentmihalyi, 1975, 1990).
4
Anxiety occurs when the skill level of a user is much lower than the challenge level of
the task. This research focuses on classifying these three user states in gaming.
2.2. ELECTROENCEPHALOGRAM (EEG)
To measure user states, a range of technologies have been developed that record
brain activity. Some of the tools are functional magnetic resonance imaging (fMRI),
electroencephalography (EEG), magnetoencephalography (MEG), near infrared
spectroscopy (NIRS), and electrocorticography (ECoG) (Brunner et al., 2011). Among
the above-mentioned BCI technologies, we used EEG in our research to record the brain
activity of users. The reason for selecting EEG is due to its high temporal resolution and
non-invasive nature of the technology (Berta et al., 2013). The EEG recordings consist
of delta (1-4 Hz), theta (4-8 Hz), alpha (8-12 Hz), beta (12-30 Hz) and gamma (30-32
Hz) spectral band frequencies. Each spectral band represents a set of cognitive activity
occurring in the brain while performing a task. For example, alpha and theta bands are
helpful to study users’ attention and sense of immersion. Since the beta band is large, it
can be further divided into three sub-bands, namely, low-beta (12-15 Hz), mid-beta (15-
20 Hz), and high-beta (20-30 Hz). The beta band represents self-awareness, mental
activity and reasoning (Berta et al., 2013). The neural correlates of different user states
can be observed based on the density variations of the spectral bands discussed above
(Li et al., 2014). In our research work, theta, alpha, beta and sub-bands of beta were
considered to classify the user states while gaming.
5
2.3. RELATED WORK
Previous studies have assessed user states, especially the flow state, using data
from different physiological and psychological technologies like galvanic skin response
(GSR), electroencephalography (EEG), electrocardiogram (ECG), electromyography
(EMG), and electrodermal activity (EDA) (Berta et al, 2013; Rissler et al, 2018). There
are other approaches such as self-reported questionnaires and interviewsthat are based
on the users’ recall of the experience (Bhattacherjee, 2012). Recent developments in
information systems (IS) have offered more ways to analyze user states. They include
more objective measures that combine EEG signals and machine learning techniques to
classify the user states.
Machine learning techniques provide a systematic approach for classifying
multi-channel EEG signals (Garrett et al, 2003). Recent studies have used machine
leaning to optimize players’ gaming experience (Hair, 2007), where players are
segregated based on their experience in gaming and their momentary scores. Analyzing
variables such as scores and responses to situational changes in the computer-based
gaming environment helps designers and developers understand both their target
population and design dynamics to optimize gaming experience (Hair, 2007). The SVM
model is considered as a state-of-the-art machine learning technique for classifying
brain activity obtained from EEG (Berta et al., 2013).
Berta et al (2013) focused on building a machine learning classifier that can
distinguish three user states, namely, boredom, frustration/anxiety, and flow. They
trained the SVM model with radial basis function kernel (RBF) in two different
conditions:1) user-dependent with a classification accuracy of 50.1%, and 2) user-
independent with an accuracy of classification of 66.4%. Berta et al (2013) also
6
implemented a feature selection method to extract important EEG components and then
analyzed these components using SVM for reduced computational times and better
classification accuracies. After comparing the models with and without feature selection
variables, they found that the model with all the components from the data collected
have higher performance than any other models. Another study by Chatterjee et al.
(2016) also applied machine learning models to identify cognitive flow. They
implemented the Bayesian network to detect cognitive flow during gaming and derived
an accuracy of 62.2 % based on data from the EEG and GSR technologies. Another
research has used the SVM model to classify emotions into boredom, engagement, and
anxiety while playing the Tetris game and obtained an accuracy of 53.33 % (Chanel et
al., 2008). Chanel et al. used EEG and GSR data to classify the above-mentioned
emotions using the SVM (Radial Basis Function kernel) model.
Plotnikov et al. (2012) used a gaussian kernel SVM model to assess flow in
games based on EEG data and obtained an average accuracy of 57%. A study by Rissler
et al. (2018) implemented SVM and random forests models to classify low flow and
high flow in gaming using physiological data that include electrocardiography (ECG),
blood volume pressure (BVP), and electrodermal activity (EDA). The result shows that
cardiac features play an important role in categorizing the flow state, with random
forests being a more accurate model (72.3%) than SVM (Rissler et al., 2018).
Lin et al. (2008) implemented the SVM – RBF model to classify 32 channel
EEG data into four states – joy, arousal, sadness, and pleasure – based on emotions
triggered by music. To classify emotions, the EEG data was divided into the following
frequency bands: delta (1-3 Hz), theta (4-7 Hz), alpha (8-13 Hz), beta (14-30 Hz), and
gamma (31-50 Hz). The study resulted in successful classifications of the emotions with
7
a maximum accuracy of 92.73% that used all the frequency bands combinations.
Another study with the same context of listening to music utilized the multilayer
perceptron classifier to classify the EEG data into joy, angry, sadness, and pleasure and
obtained an accuracy of 69.69 % using a sample size of five (Lin et al., 2007).
Similarly, another study by Wang et al. (2011) used machine learning algorithms
to classify user states in the context of movie elicitation. The time domain features and
frequency domain features of EEG data were compared to assess which features classify
emotions more correctly. They used the SVM-RBF model, k-NN model, and multilayer
perceptron model to classify user states into joy, sad, relax, and fear. The SVM-RBF
model achieved higher accuracy (66.51%) than other models with frequency domain
EEG features as input. A similar study was conducted by Wang et al (2014) that
compared three different EEG features, specifically power spectrum, wavelet, and
nonlinear dynamical analysis, to understand the relationship between emotion and EEG
data in the context of movie elicitation. The emotional state classification was done
using the different kernels (RBF, polynomial, linear) of the SVM model across all the
combinations of frequency bands (delta, beta, alpha, theta, and gamma). The results
indicate that the power spectrum plays an important role in classifying the emotions
with the linear kernel SVM (87.53%) model achieving the highest classification
accuracy using a combination of all bands (Wang et al., 2014).
Several studies in the medical field studied the classification of EEG signals
based on machine learning techniques, where the SVM model was frequently used.
Lotte et al. (2007) reviewed the performance of all machine learning algorithms
available for the purpose of classification from EEG to BCI systems. The SVM model is
the most efficient for synchronous BCI due to its regularization property, simplicity,
8
and robustness. Vladimir et al. (2015) investigated the performance of the SVM model
for seizure prediction using EEG signals. The SVM – RBF kernel model was used in
the classification of EEG signals into seizure and non-seizure signals with an accuracy
of 95.33 % (Joshi et al., 2014). Another study classified EEG signals into epileptic
seizure or not using the SVM model with an accuracy of 98.75 %, where principal
component analysis (PCA), linear discriminant analysis (LDA), and independent
component analysis (ICA) were used for the feature reduction process (Subasi et al.,
2010).
Liang et al. (2006) evaluated the performance of backward propagation neural
networks and SVM models for mental task classification based on EEG signals. Other
models like k-NN and decision trees were used to classify the sleep stages, with k-NN
achieving higher classification accuracy than decision tree (Güneş, Polat, & Yosunkaya.,
2010). Alkan et al (2005) proposed an automatic seizure detection model using EEG,
logistic regression, and neural networks models, with neural networks achieving higher
accuracy (92%).
From the previous studies in the literature, we see that the SVM model has been
implemented to categorize user states based on EEG data. There are only a few studies
on classification of user states based on frequency bands, especially for the flow state.
Hence, in this study, we explore different machine learning models to classify the user
states into boredom, flow, and anxiety with different combinations of the frequency
bands. Also, we are interested to identify the best performing machine learning model to
distinguish the flow state from all the other states. Table 2.1 provides a brief overview
9
of previous studies that have applied various machine learning models in classifications
of user states.
Table 2.1. Research on Application of Machine Learning to Classify EEG Signals
Reference
Research Setting
Summary of findings
Alkan et al.
(2005)
Automatic seizure
detection using
EEG and machine
leaning algorithms
Developed Machine learning classifiers to
identify epileptic seizure and normal EEG
signals. Logistic Regression (90%), Neural
Networks (92%)
Berta et al.
(2013)
Used 4-channel
EEG to analyze the
flow state in games
Most important bands are low beta for
discriminating among conditions during
gaming. Classified three user experience
states; flow, boredom and frustration.
SVM (66.4%)
Chanel et al.
(2008)
Emotion assessment
from physiological
& EEG data using
machine learning
models in gaming
Classified boredom, engagement and anxiety
emotions while playing Tetris game at
different levels based on self-reports and
physiological analysis. Classified boredom
and anxiety states correctly. SVM-RBF
kernel (53.33%)
Chatterjee et
al. (2016)
Identified and
analyzed cognitive
flow in gaming
Concluded that EEG and GSR data can be
used to distinguish the performance of users
in the game. Implemented a Bayesian
network model to detect cognitive flow with
an accuracy of 62.2%
Garrett et
al. (2003)
EEG signal
classification using
linear, nonlinear
and feature
selection methods
Nonlinear methods performed better than
the Linear Discriminant Analysis (LDA)
method. Detection of resting
and rotation tasks EEG signals are more
difficult than other tasks. LDA (66%), Neural
Networks (69%), and SVM (72%)
Güne et al.
(2010).
Automatic scoring
of sleep stages
based on k-NN
Proposed a hybrid system to automatically
score sleep stages using k-means. Obtained
k-NN model as the best model (82.2%)
10
Table 2.1. Research on Application of Machine Learning to Classify EEG Signals
(cont.)
Joshi et al.
(2013)
Classification of
EEG signals based on
fractional linear
prediction (FLP)
FLP is an effective method for modelling EEG
signals. Classified EEG data using signal
energy and error energy as parameters to the
SVM model. SVM-RBF kernel (95.33%)
Liang et al.
(2006)
Mental task
classification based
on EEG signals using
machine learning
algorithms
Evaluated performance of Backward
Propagation Neural Networks (BPNN),
SVM, and ELM classifiers using EEG
signals. Obtained similar classification
accuracies for all the three models and model
accuracy can be improved by smoothing raw
outputs.
Lin et al.
(2007)
EEG signal-based
emotion
classification
using music
elicitation and neural
networks
Developed an offline emotion classification
algorithm based on EEG signals that are
relevant to music and multilayer perceptron
neural networks to classify joy, angry,
sadness and pleasure.
Lin et al.
(2008)
Recognize emotional
responses during
multimedia
presentation using
EEG signals
Developed a framework to uncover the
relation between EEG signal and music
induced emotion. Most important bands were
delta, theta and alpha related to emotion
responses. SVM- RBF (92.73%)
Lotte et al.
(2007)
Review of
classification
algorithms based on
EEG signals
SVM models are productive for synchronous
BCI due to the property of regularization and
immunity to the curse of dimensionality.
Combination of classifiers and dynamic
classifiers are also very productive.
Plotnikov et
al. (2012)
Used 4 channel EEG
headset to distinguish
flow from boredom
condition in Tetris
Statistically distinguished various levels of
boredom and flow in game players with an
accuracy of 73%.
Rissler et
al. (2018)
Used machine
learning to categorize
the intensity of flow
(low and high)
ML techniques can build flow classifiers
that are dependent on peripheral nervous
system features alone. Random forest is
the best model (72.3%). SVM (57.4%)
11
Table 2.1. Research on Application of Machine Learning to Classify EEG Signals
(cont.)
Subasi et al.
(2010)
Epileptic EEG signal
classification using
PCA, ICA, LDA and
SVM
Implemented dimension reduction by
principal component analysis (PCA),
independent component analysis (ICA),
and LDA
Vladimir et al.
(2015)
Seizure prediction
from EEG data
Successful seizure prediction based on EEG
signals using the SVM model.
Wang et al.
(2011)
Emotion recognition
system based on EEG
signals using movie
elicitation and
machine learning.
Classified EEG based emotion recognition
when watching movies into joy, relax, fear and
sad. Showed that frontal and parietal EEG
signals were even more informative based
on Minimum Redundancy Maximum
Relevance feature selection method.
SVM-RBF (66.51%), Multi-layer
perceptron (63.07%), k-NN (59.84%)
Wang et al.
(2013)
Emotion state
classification based
on EEG signals
during movie
induction experiment
using machine
learning approach
Power spectrum of all frequency bands is an
effective robust feature for classification.
High frequency bands play an
important role in emotion activities than
low frequency bands. Compared three
different kernels of the SVM model. Best
model is kernel-RBF.
12
3. RESEARCH METHODOLOGY
3.1. EXPERIMENTAL DESIGN
A within-subject experimental design was used in this research, where the same
individuals experienced more than one conditions (i.e., resting, boredom, flow, and
anxiety). Since the main purpose of our research is to assess the flow state against
boredom, anxiety and resting states, a within-subject experimental design is appropriate,
in which the subjects serve as their own control. This laboratory experiment was
designed to capture EEG recordings for the resting, boredom, flow, and anxiety states
using a 64-channel EEG technology called Cognionics. The design was adopted from
Berta et al. (2013) who used a plane battle game and 4-channel EEG technology. In our
study, the animated game, Tetris, was used to induce boredom, anxiety, and flow states.
The experiment consisted of four parts – each part is used to induce a specific user state,
i.e., resting, boredom, flow, and anxiety.
3.2. RESEARCH PROCEDURE
The following steps provide a detailed explanation of the laboratory experiment
where the four user states were induced through the Tetris game.
Step 1: In order to capture the subject’s orientation towards gaming, a
questionnaire that was prepared based on previous studies was administered to the
subject to fill out before the experiment started.
13
Step 2: The resting state was invoked by having the subject stare at a small cross
on a dark background screen of the same color as the background color of the game in
the experiment.
Step 3: The boredom state was induced using the lowest level (i.e., level 1) of the
game. In addition, the subject was provided with a mouse that has been click-disabled,
such that the subject could not shorten the wait time for the block to fall but had to wait
for each block to fall to the base.
Step 4: The flow state was induced by setting the game at level 5 and having the
subject play until all the blocks piled up to the top. During the gameplay, the game level
automatically increased as the subject cleared each level of difficulty.
Step 5: The anxiety state was induced by setting the challenge of the game at a
very high level (i.e., level 15 and above) such that it way surpassed the skill level of the
subject. Here the subjects were required to play the Tetris game two times at level 15
followed by two times at level 20. At the end of each of step 3 to step 5, the subject was
asked to fill out a questionnaire that served as a validation check for the manipulations.
Step 6: A retrospective process tracing was carried out for each of the induced
states, where each participant was asked to verbalize his or her experience while
watching a video playback of their gameplay recording. Based on the subject’s
verbalization of the experience, we determined a 30-second interval that best represents
each of the three induced user states for data analysis.
14
3.3. MEASUREMENT
To measure the neurophysiological data while playing the Tetris game, a
Cognionics dry EEG headset with 64 channels was placed on the subjects’ head (see
Figure 3.1). The EEG headset contains 64 Ag-AgCl pin-type active electrodes mounted
in a Bio Semi stretch-lycra head cap.
Table 3.1. List of Electrodes in EEG Headset and Positions in the Human Scalp
The commonly used 10-20 EEG electrode placement was implemented to record
electrical activity of the subjects’ brain. Table 3.1 provides the list of electrodes in the 64-
channel EEG headset used in this research and their respective positions on the scalp.
Position Name
Channel Name
Anterior – Frontal
AFp3h, AFpz, AFp4h, AF5h, AFF5, AFF5h,
AFF3, AFF1, AFFz, AFF2, AFF4, AFF6h,
AFF6, AF6h
Frontal
FFC5h, FFC3, FFC3h, FFC1h, FFCz, FFC2h,
FFC4h, FFC4, FFC6h
Fronto – Central
FCC5h, FCC3, FCC1, FCC1h, FCCz, FCC2h,
FCC2, FCC4, FCC6h
Central
CCP5h, CCP3, CCP1, CCP1h, CCPz, CCP2h,
CCP2, CCP4, CCP6h
Central – Parietal
CPP5h, CPP3, CPP3h, CPP1h, CPPz, CPP2h,
CPP4h, CPP4, CPP6h
Parietal-Occipital
POO7, PO7, PO5, PO3, PO1, POz, PO2, PO4,
PO6, PO8, POO8
Occipital
O1h, Oz, O2h
15
Figure 3.1. 64-Channel Cognionics EEG Headset
Figure 3.1 shows the electrode positions of 64-channel Cognionics EEG headset
on the human scalp.
3.4. CLASSIFICATION USING MACHINE LEARNING
Machine learning is a subset of artificial intelligence that focuses on finding
patterns based on the training data for making future predictions. It can also be
considered as real-time analytics using algorithms to analyze the rules of a game and in
response to players’ actions to improve their performance (Ramirez, 2014). It is a
combination of several other concepts like data mining, predictive modeling, clustering,
mathematical modeling, and statistics. In this research, we focused on supervised
16
machine learning models – SVM, RF, k-NN, and mlogit to classify the user states. The
following sub-sections briefly explain the above-mentioned machine learning models.
3.4.1. Support Vector Machine. SVM is considered as the state-of-the-art
kernel-based supervised machine learning algorithm implemented for classification (Lin
et al., 2008). The algorithm is built on nonlinear kernel function that converts the given
input data into high dimensional space. The algorithm learns from the given data
iteratively and generates optimal hyperplanes with maximal margins for every class in
the high dimensional space (Subasi et al., 2010; Lin et al., 2008). These maximal
margin hyperplanes result in decision boundaries that help in classifying different
classes. SVM models have the capacity to deal with large sets of data with high
classification accuracies (Chang & Lin, 2011). This research implements radial basis
function kernel (RBF) of the SVM model which is a nonlinear kernel that maps the
given data into a high dimensional space.
3.4.2. Random Forests. RF supervised machine learning model was proposed
by Breiman (2001), where classification is performed by constructing each tree based
on bootstrap samples of the given data. In comparison to standard trees where each node
is split using best split among all input variables, random forests split each node based
on a subset of predictors randomly selected at that specific node. This strategy gives
random forests better performance and immunity against overfitting problems, when
compared to other models such as linear discriminant analysis, support vector machine,
and neural networks (Liaw and Wiener, 2002).
3.4.3. k-Nearest Neighbors. The k-NN model is the simplest classification
model that searches the entire training data set to classify a single test point based on
tuning process using cross validation. As the size of the training dataset increases, the
17
quality of classification also increases. This feature makes k-NN a model with good
classification accuracy, but it suffers from overfitting issues (Goldberger, 2005).
3.4.4. Statistics for Evaluating Models. Machine learning models have various
statistical metrics like F1-score, accuracy, kappa statistic, precision, recall, lift, and
AUC (Caruana, 2006). The classification accuracy statistic assesses the ratio of correct
predictions to the total number of cases evaluated. It ranges from 0 to 1 and is
dependent on input data. AUC is used to evaluate the machine learning classifier
model’s performance based on the area that is under the curve and is independent of the
data (Bradley, 1997). The kappa statistic is used to evaluate the overall performance of
the machine learning classifier, especially in a multi-class classification problem. It
compares a correctly classified model’s performance with the performance of a
classifier that randomly classifies data based on their frequency of occurrence (Landis
and Koch, 1977). The kappa statistic not only evaluates a single classifier, but also
evaluates various classifiers by comparing among them. In this research, we use the
kappa statistic, accuracy, and AUC to evaluate the machine learning models’
performance as most of the previous studies also implemented these statistics for model
comparisons.