DALMOOC – ID's Blog

Dezember 20, 2014März 3, 2018

Learning Analytics MOOC – Week 9 – Wrapping Up

It’s the final week of DALMOOC: On the one hand I am glad that from now on I’ve got time for my other hobbies again (my fish tank needs my attention), on the other hand I look back on nine weeks of really interesting and challenging MOOC activities – again a sincere thanks to all of you! My feedback to course structure and my learning can be found in many of my blog posts, especially week7.

1. DALMOOC CMAP

Creating a DALMOOC cmap is a very useful task, reminded me of my Master Studies some years ago, where we did that a lot in our learning group. I could have worked a few more weeks on that (already spent many hours and some evenings with it) – Normally I would have created a cmap for each of the 4 main units in DALMOOC and another one for the course structure, so as a result, there is much content in my cmap. The cmap includes my understanding of what was important and what I would like to keep in mind – I hope I got it right and there aren’t too many mistakes in it. Data source and method: I reread my blog posts and copied contents and keywords in the cmap (no, I didn’t do text mining for that…). There are some things I didn’t cover in the map, for instance, ProSolo functions (sorry, it doesn’t mean that I didn’t like ProSolo), learning analytics software, more detailed connections between the units, …

I think, others might be scared or irritated by the amount/mass of information, but I tried to use colours to make it a little bit easier and to show the original course units. Hopefully, I can use the cmap for my job should the topic arise (at the moment I’ve got my hands full with a lot of other elearning related topics).

Download DALMOOC CMAP [jpg]

Download DALMOOC CMAP [pdf]

Download DALMOOC CMAP [cmap]

2. Learning Analytics in Germany

Based on competency 9.1, I thought it would be a good idea to collect some links about Learning Analytics in Germany. I didn’t realize at the time that so many conferences and working groups covered the topic „Learning Anaytics“ in their program this year (I’ve attended some of the conferences…).

LA in general

Article in the (most) important German elearning web portal eteaching.org: http://www.e-teaching.org/didaktik/qualitaet/learning_analytics/ & opinion pro/contra: http://www.e-teaching.org/community/meinung/pro_con_learning_analytics
Topic in one of the first German MOOCs „Open Course 2012 – Trends in E-Teaching“: http://opco12.de/4-15-juni-2012-learning-analytics/
Special interest group „Gesellschaft für Informatik, Fachgruppe E-Learning, Arbeitskreis Learning Analytics“: http://fg-elearning.gi.de/fachgruppe-e-learning/arbeitskreise/
Often-cited project/tool „LeMo“: http://www.lemo-projekt.de/publikationen/

LA as conference topic in Germany 2014 (selection)

LEARNTEC conference 2014 (Karlsruhe), 4.2.2014: http://www.learntec.de/messe-karlsruhe-learntec/2014/media/data/diverses/LT14-NUR_KONGRESSPROGRAMME_–_WEB-Version_21012014.pdf
Working group „DINI E-Learning“, DINI Zukunftswerkstatt Learning Analytics (Fulda), 17.6-18.6.2014: http://dini.de/veranstaltungen/workshops/zukunftswerkstatt2014/programm/ and Results: http://dini.de/fileadmin/workshops/zukunftswerkstatt2014/2014-06_Fotoprotokoll_DINI_Zukunftswerkstatt.pdf
DeLFI conference 2014 LA Workshops proceedings (Freiburg), 15.9.2014: http://ceur-ws.org/Vol-1227/
Working group ZKI AK E-Learning Meeting (Kaiserslautern), 22.9.2014: https://www.rhrk.uni-kl.de/en/news/zki2014/arbeitskreise/ak-e-learning/
Talk at Campus Innovation conference (Hamburg), 20.11.2014: https://lecture2go.uni-hamburg.de/veranstaltungen/-/v/16918
Online Educa conference (Berlin), 3.-5.12.2014: http://www.online-educa.com/programme
„Kompetenznetz E-learning Hessen„, Fachforum “Learning Analytics mit Moodle“ (Fulda), 3.12.2014: http://www.gmw-online.de/2014/11/fachforum-learning-analytics-mit-moodle-am-3-12-2014-an-der-hochschulde-fulda/
(upcoming) LEARNTEC conference 2015 (Karlsruhe), 27-28.1.2015: http://www.learntec.de/media/data/kongress_2/LEARNTEC15_Kongressprogramm_Vortragsprogramm.pdf

Important international societies that are doing research in Learning Analytics are:

International Educational Data Mining Society (IEDMS)
Society for Learning Analytics Research (SOLAR)
International Society of the Learning Sciences (ISLS)

(Update 31.12.14)

And that’s my certificate:

Dezember 14, 2014März 3, 2018

Learning Analytics MOOC – Week 8 – Text Mining Nuts and Bolts

Whereas one focus of week 8 was working with LightSide, basic information about the following steps of text mining was provided in this week’s videos (s. https://www.youtube.com/user/dalmooc). We had some reflection tasks, hands-on-experimentation, assignments and group tasks this week – I made the choice to note the main aspects of week 8 and also to do something in LightSide in order go get some practice with it.

1. Text Mining: Data Preparation
As the process of preparing data for data mining / text mining can be very complex and requires a lot of time and thought, you should think at the beginning if it is realistic to achieve and if it is worth doing (e.g. when you can use the trained model for other studies where similar data is collected etc.).

1.1 Cleaning text
It would be nice to have raw data already in a tablular form or at least already in structured form (xml, json, sql) and so being able to use plugins / programming language to get it in a tabular form. In addtion, it can be necessary to aggregate data first because not every entry should be a unit in the dataset for machine learning (perhaps might be done in Excel with some macros). Things like reformatting because of non standard character encoding (UTF-8 would be good, LightSide can handle that format), disfluencies (perfectly formed English in data is unrealistic when doing learning analytics) or text which is in another language (LightSide is configured for English) might need attention and additional software plugins.

1.2 Annotating data
„Training a predictive model requires annotated training data“ – a set of 1000 instances of labeled data is a good start: 200 as development data, 700 for cross-validation and 100 as final test set. A dataset example which results out of a simple poll already has a label given by the poll (yes/no), but otherwise you would have to think about what you would like to detect in student interaction. Maybe there is already a coding manual of the codes you are interested in.

2. Text Mining: Getting a Sense of Data
The step „getting sense into data“ is a step in the data mining process that many people don’t spend enough time doing and which gets better with own experience and reading linguistic books. The qualitative analysis is an important „precursor to predictive modeling“.
Regarding sentiment analysis, it’s more complicated than reading text, counting positive and negative words (individual words are not enough): context matters, rhetorical strategies may appear, sentiment might be expressed indirectly

3. Text Mining: Basic Feature Extraction with LightSide
Feature extraction is about thinking what we would like to include in the model, what will correlate with what we’re trying to predict.
A noisy predictor of class value would be a term which can be used in different meanings and might therefore for some might mean agreement and for others might mean disagreement with something (like in our Healthcare poll example „cost for one person“ or „cost for society as a whole“ – More context would be needed to be sure of the meaning).

LightSide provides very easy access to a broad range of simple low-level text features.
In LightSide, the panel „Extract Features“ would automatically check off the text field to extract features from – but if you’ve got other variables in other columns of the dataset, in the menu „Feature Extractor Plugins“ besides „Basic Features“ the option „Column Features“ should be checked off also.

In „Configure Basic features“ you have to choose among Unigrams (=Individual words), Bigrams, Trigrams, POS Bigrams (= Part of Speech Bigrams), POS Trigrams, Word/POS Pairs, Line Length, Count occurences, Include Punctuation, Stem N-Grams, and other options (handling of stopwords etc.).

Unigrams are an easy way to try to grab the content of a sentence, but you loose the context and structure of the sentence. With bigrams, there is a already a little bit of ability to disambiguate. With using a combination of unigrams and bigrams the feature space gets much larger which leads to a higher possibility of overfitting – adding richer features gives you more information but comes with a cost.
Another idea is to think about words in terms of grammatical categories, in parts of speech (noun, preposition, verb…): which parts of speech tags occur next to each other? There are standard tag sets for „Part of Speech Tagging“ which can be used.

Line length just counts the number of words in a text and could be meaningful dependent on the kind of text.
Stopwords are often removed in text classification (one of the things which come from information retrieval) – but in text chat it would be the other way round: „contains non-stopwords“ would be interesting.

Features like N-Grams, Part of Speech Bigrams and Word/POS pairs were described as being binary features (true/false), but another way would be thinking about them as count features – that happens if you check off „Count Occurences“ – then they aren’t binary encoded any more.

You have to decide if you want punctuation as part of your feature space or not: Sometimes it just adds noise – not everybody uses it and some use it inconsistently (Yes, that would be me, when writing in a foreign language and thinking about difficult concepts… punctuation isn’t a priority).

Another decision is: Do you want to use stemming or not? Stemming removes the endings from various forms of a word and makes the feature space a little more compact.

These selections interact with eath other and so part of speech tagging is done first (before stemming or stopword removal).

4. Text Mining: Interpretation of Feature Weights
This starts when the model is already built. LightSide has a panel „Explore Results“ (normally used for error analysis) with which you also can look at feature weights. In the confusion matrix (called „Cell highlights“) you can select „feature weight“.
Words that are negative should have a large negative weight when you selected negative data & positive prediction in the confusion matrix (for example „bad“ = -0.8231) At the bottom of the LightSide interface you can choose the Extractor plugin „Documents display“ and check off „Filter documents by selected feature“ and „Documents from selected cell only“ in order to see where in the original text the words occur.

5. Text Mining: Comparing Performance of Alternative Models
In LightSide, you can compare different models by using the panel „Compare Models“. If you want to compare two models with a different feature space (one with Unigrams and one with Unigrams and Bigrams) for a specific text, with the option „Comparison Plugin = Basic Model Comparison“ you can see the performance values and confusion matrixes in one screen. If you switch to the LightSide option „Difference Matrix“, you can look at misclassifications in the text context.

6. Text Mining: Advanced Feature Extraction
„Advanced features enrich the feature space, but expand the size of the feature space – large feature spaces mean added risk of overfitting“
I’d like to keep this short, because as a beginner, I’ll stay with the simpler things at first (= LightSide’s „Basic Features“ in the Feature Extractor Plugin).
Advanced options would be: Stretchy Patterns (for context around a word: definition of pattern legth, gap length and using categories – there are some predefined categories in the Lightside Toolkit Folder), Regular Expressions (help available in LightSide), Character N-Grams (for spelling modifications, consistent endings,..), Parse Features (slow, produces a huge number of features, seldom used).

7. Text Mining: Working with LighSide

I did a lot of things in LightSide: Extracting features, building and comparing models, inspecting models and interpreting weights… I’m optimistic, that I understood the technical part of how to do this and that I got an impression of the process and that’s about it.

My results with LightSide are in this attached pdf file:
w8-assignment-ID.pdf

(screenshot from my pdf)

Dezember 6, 2014

Learning Analytics MOOC – Week 7 – Text Mining Introduction

The Google Hangout of week 7 was as interesting as ever (and for me again the archived version). At this time of the MOOC it’s no surprise for me, that Carolyn and George emphasized that analytics isn’t easy (I think we all felt that in this course as well) and that handling the specialized software is even the easy part of the process. What I understand completely by now is that learning analytics is very interdisciplinary – right now we have got computer linguistics in the mixture. At Heidelberg University we have got an „Institut für Computer-Linguistik“ – it might be interesting for me to contact them about elearning & analytics some time in the future.

As I worked on this week’s blog post for some days, it got longer and longer and therefore I divided the article in three sections. One section is about saying thank you for a great learning experience although we’ve still got two more weeks – but who knows, with christmas preparations and further demanding tasks in text mining ahead, if I find the time later on 😉

1.) Some thoughts about the DALMOOC structure
My observation regarding the course structure is that the segments of the MOOC are kind of independent (I think you really could do the cMOOC thing and pick just one topic and engage with it) and on the other hand I see the full picture by now and why these parts were chosen by the instructors and how they fit together very well. I think it is extremely difficult to design a good MOOC for everyone – for experienced learners in the chosen topic as well as for beginners – and to reduce a topic (which you normally would spend a semester on as Ryan said) to a few weeks, combine different teachers and do this on a high level of including current research. Great job so far!
In addition to the discussion forums on edX, the Google Hangouts provided an important element of continuity, live feeling and caring – It would be interesting to know, how much time the facilitators spend each day with the MOOC… Twitter in its way was also motivating to get in contact with fellow students and helped me to stay on-board – so thanks a lot for favs and retweets!
In the beginning I had two goals: to learn something about learning ananlytics and to have a closer look at the dual structure of the MOOC. I had to reduce my second goal due to limited time and getting more engaged in the content part than I had planned / expected. So my parallel visits to ProSolo weren’t as frequent as hoped, but I experienced the different structure of initially the same weekly resources. It has much charm, but I returned to edX for my learning because I know (and like) the edX interface very well from former MOOCs and from my professional job. Until now, I even stayed away from Bazaar, because after the first weeks the course content was so new and difficult for me, that I had the impression that I wouldn’t be able to contribute in time something meaningful via a synchronous text chat channel, in a foreign language and where I maybe would be paired with someone who expects a meaningful discussion on a higher level than I could offer. But that’s a very personal assessment, I’m sure that others saw it in a totally different way. In a MOOC – and especially this one – there are so many different possibilities and learning pathways that you have to choose a combination you are happy with (and I’m happy with mine). I have stopped counting the many hours I spent on the MOOC each week and I’m fully aware that this is an exception which I can’t do often. A similar very valuable demanding MOOC for me was Nellie Deutsch’s (first) „Moodle MOOC on WizIQ“ in June 2013, where (besides blogging and taking part in many forum discussions) I created a lot of digital artifacts in the 4 week course duration. In comparison, the HarvardX Justice MOOC in 2014 was „easy“ for me because of the very plannable amounts of time: It was possible with about 4 hours a week to get a very good learning experience with the consistent (and definitely not boring) video, self-test/quiz and poll structure, the exam at the end and even without any live sessions – reflections about the topic included.

2.) Text mining methods as part of data mining – overview of the process of building and evaluation a model
An example of „collaborative learning process analysis“ illuminated that a theory driven approach (from the fields of psychology, sociolinguistics and language technologies) is considered to be more effective than shallow approaches to the analysis of discussion data: If you build models from an understanding of these theories, the models will be more effective.

(Accidentally) overfitting is always a risk, so you have to gain awareness of the important methodological issues for avoiding it – overfitting is „where you build a model that’s really too particular to the data that you have trained it on, in such a way that you don’t get good generalization to new data that you want to apply your model to“. Keep the data you train on and the data you test on separate, but it’s good when the data set you train the models on is representative of the data you later test on.

The text mining process in simple form consists of:

Raw textual data ->
Extraction of Features (with some awareness of the structure of language and of what we try to capture) ->
Building a Model from those features (From then on it’s like other kinds of data mining) ->
Classification

A lot of work in text mining is „representation„: „You have to know what it is about the data that you want to preserve so those instances that should be classified the same look similar – and those instances that should be classified differently look differently.“ Three sets of data are recommended: A development/exploration set, an evaluation/cross-validation set (for training and testing) and a final test set.
A starting point is qualitative analysis and setting aside data for development (which you don’t use later on for training and testing!) and looking for examples from each of the categories you want your model to distinguish between – then you have to think about how to extract those features from the raw text (in order to build these vectors of features that you can apply a machine learning algorithm to). You’ll extract those features from your cross-validation data and do a cross-validation to evaluate how well that model is doing. Usually it is not good enough at the first round, so you do an error analysis: You train the model on your cross-validation data, apply it to your development data and look at where the errors are occuring on the development data. With a new set of features you cross-validate in your cross-validation set, e.g. you work iteratively on your model and try to improve it and test/compare the performance on the cross-validation set.

The development set is for: Qualitative analysis before machine learning + error analysis + ideas for design of new features
The cross-validation set is for: evaluation your performance

When you think you are done, you apply that model to the final test set. The whole process is a partnership between the developer (brains) and the algorithms (software).

3.) Sentiment analysis and my thoughts about the risks of learning analytics
One of this week’s videos was about a study about the application of sentiment analysis to MOOC discussion forum data – regarding expressed sentiment and exposure to sentiment. A set of four different independent variables was used in the survival model: The individual positivity, the individual negativity, the thread level positivity, the thread level negativity – the dependent variable was „dropout“.
It was very convincing for me to hear that it’s much more complicated to really get to what a student’s attitude towards a course is than merely counting the number of positive or negative words (and a machine learning model might not be able to do it) – we have to look below the surface level analysis of text. Students might simply be discussing intensively using negative terms while on the other hand being very engaged. Or they might be discussing topics in which appear „negative“ words. It doesn’t mean automatically that they have a negative attitude towards the course or a negative experience with it. So „simplistic ideas of sentiment predicting droput are not supported“.
(s. http://educationaldatamining.org/EDM2014/uploads/procs2014/long%20papers/130_EDM-2014-Full.pdf)

I think, that it is a nice example which shows some of the difficulties with learning analytics in other situations (obviously I don’t criticize the above study): You apply LA from a worthy starting point and it seems to be so simple, but just it is not and you could even do harm if you don’t do it carefully and don’t know what you’re doing. I think, that a lot of people who talk about Learning Analytics don’t really know what they are talking about but might have a strong opinion for or against it – it might be another one of those topics in elearning where people think they know it all… In addition, specialized software for those persons could mislead to the wrong assumption, that such software results couldn’t be wrong, would be impartial and even easily transferrable to other educational settings… In times, when cost reduction and measurements are esteemed, scenarios with monitoring student and staff performance out of „business“ reasons (and not really for improvement of teaching & studying) for me don’t seem to be too unrealistic. Yesterday I saw an article on Mashable which clearly said that even business analytics isn’t easy (http://mashable.com/2014/12/05/analytics/?utm_cid=mash-com-Tw-main-link) – and in an educational context analytics is even far more complicated.

„Code of practice for learning analytics : A literature review of the ethical and legal issues“ (Niall Sclater)
http://analytics.jiscinvolve.org/wp/2014/12/04/jisc-releases-report-on-ethical-and-legal-challenges-of-learning-analytics/ I think this resource from JISC (which I saw thanks to a retweet from Dragan) is a very important one and hopefully I’ll find the time to have a closer look.

I look forward to week 8 and some more insights in text mining. In week 7 I did the LightSide task with the prepared csv files (and got the correct result), read the LightSide manual with all these definitions of terms/acronyms I’ve never heard of before (e.g. unigram = A unigram feature marks the presence or absence of a single word within a text) and could understand that Lightside „is divided into a series of six tabs following the entire process of machine learning“.

Dezember 2, 2014März 3, 2018

Learning Analytics MOOC – Week 6 – Behavior Detection and Model Assessment

Hamburg 2014

After a much-needed and long-planned break because of a sightseeing weekend in Hamburg, it’s back to DALMOOC and a look back on week 6. Thanks to Ryan’s comments/solution about the problematic „logistic regression“ question, I went on with the math tutor assignment of week 5 and was able to finish it before going to Hamburg. Week 6 has another one of those external resource math tutor things… and I haven’t completed it yet: Questions 1 to 4 were doable with Excel, but for Question 5, I haven’t got an idea how to start (maybe some time later).

Week 6 seemed a little bit easier to understand than week 5 because of the videos which explained some of the topics of week 5. However, I can’t describe the topics of week 6 in detail although I spent many hours with the 8 videos. So for now, it’s just the overview of main aspects.

For me, MOOCs which go on for longer than 4-6 weeks, are hard to manage because I would like to have some leisure time again… In my opinion, you have to be very motivated to stay on a MOOC with semester-like duration – either you really need the content / certificate or otherwise you do it because you like it very much. One of my main MOOC incentives are lecturers / facilitators who really care for what they are doing and really have something to say – like in this MOOC.

„There’s no perfect way to get indicators of student behavior that you can completely trust. It’s not truth, it’s ground truth“ – that sums it up pretty well. Sources of Ground Truth are self-report (although not common for labeling behavior), field observations, text replays and video coding.

In week 6 we heard about Feature Engineering, which „is the art of creating predictor variables“ and the „least well-studied but most important part for developing prediction models which otherwise won’t be any good“. For that you can consider papers from other researchers (there’s a lot of literature about features which were used and worked or didn’t work) and take a set of pre-existing variables (this is faster), but thinking about your variables is likely to lead to better models. These steps would be 1) Process of brainstorming features 2) Deciding what features to create 3) Creating them 4) Studying their impact on model goodness 5) Iterating on features if useful and 6) Going back to step 3 or 1

A big part of week 6 was about metrics for classifiers: Accurary, Kappa (I’ll keep in mind that for data mined models, typically a Kappa 0.3-0.5 is considered „good enough to call the model better than chance and publishable“), ROC (= Receiver-Operating Characteristic Curve), A‘, Precision and Recall. For each of these we got additional information, formulas, examples and details which might come in handy in the future.
The metrics for regressors include Linear correlation (= Pearson’s correlation), MAD/MAE (= Mean Absolute Deviation/Error), RMSE (= Root Mean Squared Error) and Information Criteria like BiC (Bayesian Information Citeria) and AIC.
Which metrics to use?
There is a saying that the idea of looking for a single best measure to choose between classifiers is wrong-headed and you could say the same for regressors. Advice: Try to understand your model across multiple dimensions and that involves using multiple metrics.

Another aspect of week 6 was Knowledge Engineering (= rational modeling, cognitive modeling). Knowledge Engineering is where your model is created by a smart human being (not a computer as a data-mined model) who is carefully studying the data and becomes deeply familiar with the target construct and understands the relevant theory and how it applies. Knowledge Engineering can even achieve higher construct validity than data mining. A good example is Aleven’s model of students‘ help-seeking. On the other hand, unfortunately, there are cases where people didn’t do it carefully in order to just get a quick result – this has a negative influence on science and maybe even student outcomes because of wrong interventions. It is hard to know if the knowledge engineering was done very good, because the work is in the researcher’s brain and the process usually invisible; it is easier to tell with data mining models.
Feature Engineering is very closely related to Knowledge Engineering, it’s not an either-or.

There are many types of validity and it is important to address them all: Generalizability (= Does your model remain predictive when used in a new data set?) / Ecological validity (= Do your findings apply to real-life-situations outside of research settings?) / Construct validity (= Does your model actually measure what it was intended to measure?) / Predictive validity (= Does your model predict not just the present but the future as well?) / Substantive validiy (= Do your results matter? Are you modelling a construct that matters?) / Content validity (= Does the test cover the full domain it is meant to cover?) / Conclusion validity (= Are your conclusions justified based on the evidence?)

The videos for week 6 can be found in https://www.youtube.com/user/dalmooc and as MOOT (Massive Online Open Textbook) „Big Data and Education“ on Ryan Baker’s website: http://www.columbia.edu/~rsb2162/bigdataeducation.html

That’s it for week 6, and we are already in week 7 with the topic „Text Mining“. The Google Hangout of week 7 is on Thursday 2 a.m. local time – so, again, it will be the archived version. Sadly one disadvantage of an international MOOC – on the other hand, I enjoy immensely the diversity and internationality of the students 🙂

November 23, 2014Mai 6, 2018

Learning Analytics MOOC- Week 5 – Prediction Modeling

Definitely not for beginners in the field of analytics. That was my first impression of week 5 and still is my impression. The videos were way too fast for me in speaking and content, but nevertheless, I’ll try to note what I got out of them, hopefully it’s correct. In this blog post I’ll also cover my experiences with the „week 5 activity“.

What’s the use of prediction modeling? Sometimes to predict the future, sometimes to make interferences about the present. There could be automated decisions by software or informing teachers so that they can do something. Starting point: There is something you want to predict – that’s called „Label“ (= predicted variable).

a) Regression = numerical (how much of a video a student will watch, what will be the student’s score,…)
In order to build a model, you obtain a dataset where you already know the answer (= training label). There are other variables (= features, predictor variables) which are used to predict the label. Regression means that you determine which features in which combination can predict the label’s value. In order to interpret the weight of the features, transformation is necessary.
One way of regression is linear regression (often more accurate than complex models particularly when you cross-validate). Another kind of regression would be regression trees (either with linear equations at each of the leaves of the tree or as non-linear regression trees).

b) Classification = set of categories (correct/wrong, will drop out/won’t drop out,…)
You get the labels from survey data, field observations, school records etc. With each label there are some features, which could be used to predict the label. A classifier is to determine which features in which combination can predict the label. Software like RapidMiner, Weka etc. have a lot of classification algorithms, but is is hard to say which work best in a certain context. Educational data has lots of systematic noise, so the advice is to use conservative classifiers and find simple models. From experience, considered as not so useful for educational data are „Support Vector Machines“, „Genetic Algorithms“ and „Neural Networks“.
For educational data there are „step regression“ (for binary decisions like will the student drop out y/n via linear regression function and rounding to 0 or 1), „logistic regression“ (for binary decisions like will the student drop out y/n via finding out the frequency of a specific value of the dependent variable, relatively conservative) , „J48/C4.5 Decision Trees“ (good in dealing with interaction effects, can handle numerical and categorical predictor variables, relatively conservative, good when the same result (drop out y/n) can be arrived in different ways), „JRip Decision Rules“ (set of if/then rules – many algorithms, decision trees created), „K* Instance-Based Classifiers“ (predicting the data point from a neighboring data point, good for very divergent data without easy patterns, you need the whole dataset) – and many other algorithms.

Week 5 Activity – Assignment: Problems and surprises

* Walk-Through
That wasn’t so easy to do, because the walk-through (which should explain the handling of the RapidMiner software) was in Flash and didn’t have a „back“ button – in my opinion, a simple pdf file would have been more helpful to get the context of the different steps. Thankfully, in the meantime we got a doc-file with the content. Another problem for me was that the software obviously expected that users would use drag&drop in order to add an operator and not „return“ like I did and which worked for some operators – but not all of them.

* External resource
The actual assignment, which was in a math tutor system, wasn’t available until you answered one question correctly. It took me a long time to find this correct answer, as I didn’t expect that I had to check the example csv file in Excel in order to compare if RapidMiner listed the correct attribute types when importing the csv. As RapidMiner identified some field attributes as binominal instead of polynominal I got a wrong result for kappa.
Finally, I had the correct answer and saw the start page of the math tutor system with a login window. I tried my edx login, but got no result as obviously it wasn’t intended that I got to this page. Not until I changed my Browser settings to accept cookies from ALL 3rd-parties, I saw the first question. These questions were hard to answer without background in statistics and I was relieved that there were some helpful infos in the discussion forum.

I answered a question regarding kappa without the field „studentid“, then about kappa without some other fields, conducted analysis with „Naive Bayes“, „W-JRip“ and got stuck with „Logistic Regression“ as my PC needed a long time to come to no result because I lost my patience after 50 minutes and stopped the process. I don’t think it was intended to last for such a long time and therefore tried a different approach with a subset in the operator „Nominal to Numerical“ which didn’t work either. As the questions in the math tutor are not numbered, I don’t know exactly where I am at the moment (somewhere in the middle I think – going back is not possible and going forward is not possible until I give the correct answer).

Also frustrating was that in my opinion one question included a double negative and I thought for some time if I should say yes for fields to exclude or say yes for fields to include – My choice was nearly 100 % wrong (so maybe my answers weren’t so wrong at all…).

I have to admit, that I would have stopped working for the MOOC if this topic would habe been in week 1… What I got out of this week is that predictions are very difficult even with software which does a lot of the job. When you don’t understand what you are doing, you are getting very wrong results – therefore a deeper engagement with statistics is required. At the beginning of the week, I nearly didn’t understand anything, but after my practice in RapidMiner I have the feeling, that in some examples I actually understood what I was doing and that’s enough for me at the moment.

* Working with RapidMiner to get the kappa and therefore access to the Math Tutor assignment

Meanwhile I’m very good in the steps which are necessary to get the kappa with W-J48 because I had a lot of attempts until I got it right…. I’ll never forget binominal and polynominal… So, maybe this 720p video helps somebody who is still trying. And on a foggy November weekend with a heavy bad cold (therefore no audio), I had time for this 🙂