Educational Research in Learning Technology

February 11, 2020

Educational Research in Learning Technology

In this post I discuss the nature (and weaknesses) of research in our field. I am broadly sympathetic with the arguments offered by Philip J. Kerr in this recent post, about how research in educational technology could be improved, but I have disagreements around the edges, enough that I think more discussion is warranted.

Kerr begins with a discussion of systematic reviews of research and comments that they "did not paint a very pretty picture of the current state of AIEd research." This motivated his post on "things that, collectively, researchers need to do to improve the value of their work" drawn from "observations mostly, but not exclusively, from the authors of systematic reviews, and mostly come from reviews of general edtech research." I'll return to the subject of systematic reviews at the end of this post.

Numbered items are Kerr's, as is indented text in italics. Also, while Kerr is more focused on artificial intelligence in education (AIEd) research, I will address my remarks to learning technology more generally. I'll structure this post following the structure in his original article.

Research outside the field

1. Make sure your research is adequately informed by educational research outside the field of edtech

I came to the field of ed-tech from outside and one of the first things I noticed was the insular nature of education research generally. Like any discipline, I suppose, it had its own canon, but also, it seemed to trail developments in other fields. I often found ideas credited to education researchers that had antecedents in other fields (the most famous of which, to my mind, being Bloom's 'transactional distance', which to me simply reflected the core tenets of information theory that had been developed decades earlier. So I agree that educational research should be much better informed about work outside the field.

That said, you have to be careful what you accept as doctrine. Kerr writes,

Unproblematised behaviourist assumptions about the nature of learning are all too frequent. References to learning styles are still fairly common. The most frequently investigated skill that is considered in the context of edtech is critical thinking (Sosa Neira, et al., 2017), but this is rarely defined and almost never problematized, despite a broad literature that questions the construct.

There's a bunch of stuff in here that needs to be unpacked.

Behaviourism

The first is behaviourism. There is no end to the number of papers that argue that we should abandon behaviourist theories and instead adopt newer and mostly constructionist theories of mind. But there isn't, and never was, a single thing called 'behaviourism', and it helps to know exactly what is being thrown out, and why. We have:

- psychological behaviourism - this is the behaviourism of B.F. Skinner that reduces all psychology to descriptions of stimulus, conditioning, and response. In so doing it eliminates most mental constructs from causal accounts of behaviour, including ethics, intentions, will, and desires. This is the classic 'black box' behaviourism.

- philosophical behaviourism - this form of behaviourism allows that there are internal states but notes that they can only be described in terms of external behavior; phenomena such as intentions, desires and beliefs are described as dispositions the causes of which are complex and not reducible to these mental phenomena. Noted proponents include Gilbert Ryle and (perhaps) Ludwig Wittgenstein.

- methodological behaviourism - as described by John Watson, this form of behaviourism makes the pragmatic statement that references to mental states such as beliefs and desires add nothing to our explanations of behaviour; if we state 'John believed it was cold and went outside" we know nothing more than we do if we state "John went outside".

So what is it we are rejecting when we reject behaviourism? More importantly, what is it that we embrace? If it's simply a rejection of the black box theory, that may make sense, because there clearly is something inside our heads that is doing the thinking. But is that thing inside our head reducible to talk about beliefs and desires and the rest? That's a very large leap to take, and one not warranted by 'what is known' in other disciplines.

Learning Styles

Next is learning styles. This should be the subject of a whole other paper, but suffice it to say that there are some people working in the domain of cognitive psychology who believe they have proven that learning styles don't exist, and that education theory should stop referencing them.

Now I would like to point out that they have proven nothing of the sort. It is a matter of simple observation that different people prefer to learn in different ways. Yes, systematizing this is an open question, and no, it doesn't make a lot of sense to reduce these to four simple categories. But you can't tell me that a blind person learns the same way as a deaf person, and you can't say that it doesn't matter to me whether or not I learn better in groups, because I really really don't.

What they have shown is that, if you define learning outcomes in terms of test results, and if you employ an instructivist pedagogy, where the focus is on retention of facts and methods presented, adapting one's pedagogy to match a person's self-identified learning style does not produce a statistically significant improvement in outcomes. But that is a very narrow basis on which to dismiss the entire scope of learning styles theory.

Critical Thinking

I have argued in the past that educators seem to have their own special and unique definition of critical thinking that doesn't really resemble what is taught outside education as critical thinking. I've tried to address that, but educators remain unmoved (and, honestly, not likely to be convinced by me).

That said, I don't think the problem here is that educators have rarely defined critical thinking (I find the definitions terribly lacking, but they have at least tried) and I don't think that the problem is that educators have failed to problematize critical thinking. In fact, I think that problematization is one of those special 'educational' types of critical thinking, even if it did originate in the writings of people like Foucault. I see no particular value in problematizing for the sake of problematizing.

The sorts of questions asked in problematization - "who is making this statement?", "for whom is it intended?", "why is this statement being made here, now?", "whom does this statement benefit?", "whom does it harm?" - represent a particularly parochial approach to critical inquiry. Maybe, in some contexts, these questions should be asked, but even if so, they should form only a part of a much wider questioning.

A Sceptical Attitude

2. Adopt a sceptical attitude from the outset

Kerr, along with a number of other writers (most notably Audrey Watters), have argued that ed-tech has been over-hyped, and that we should not unquestioningly accept what we are being told. I am not sure that we actually are unquestioningly accepting what we have been told (especially during this contemporary wave of techno-scepticism). In any case, Kerr's point breaks down into a few sub-points.

Educational Gains

Kerr writes,

Know your history. Decades of technological innovation in education have shown precious little in the way of educational gains and, more than anything else, have taught us that we need to be sceptical from the outset. ‘Enthusiasm and praise that are directed towards ‘virtual education, ‘school 2.0’, ‘e-learning and the like’ (Selwyn, 2014: vii) are indications that the lessons of the past have not been sufficiently absorbed (Levy, 2016: 102).

It is not at all clear to me what Kerr means by 'educational gains'. I'm thinking he probably means educational outcomes, as defined by test results produced as the result of an instructivist teaching approach. But I think it is clear that 'educational gains' can be defined in many other ways. Here are just a few:

- access - it is demonstrably true that educational technology has increased access. More people today have access to learning opportunities, both formal and informal, than at any time in history. Entire economies have been developed as a result; the growth of the technology industry in India, for example, is hard to imagine without the contributions of educational technology.

- cost - advances based on educational technology, such as open educational resources (OER) have been shown to reduce costs for students pretty much everywhere they have been deployed. Numerous studies attest to savings of millions of dollars.

- community - maybe collaboration and communities produce better test results, and maybe they don't. But community itself is a value, especially when it becomes possible to develop learning communities that span regions and cultures. And it is easily problem that the world-wide communities that are developed in an educational context would not be possible without technology.

I think the key point here is to note that not everything is designed to increase test scores. In my own work, for example, I have never been interested in test scores.

Exciting Potential

Kerr writes,

The phrase ‘exciting potential’, for example, should be banned from all edtech research. See, for example, a ‘state-of-the-art analysis of chatbots in education’ (Winkler & Söllner, 2018), which has nothing to conclude but ‘exciting potential’. Potential is fine (indeed, it is perhaps the only thing that research can unambiguously demonstrate – see section 3 below), but can we try to be a little more grown-up about things?

People who have read my newsletter know that I greet each new announcement with a healthy dose of scepticism. When someone says 'exciting' potential to me it is like a red flag, causing me to ask:

- what is it that this thing is trying to do?,

- can it be done in the way they suggest?, and

- would that make any difference?

(Notice how different these questions are from the ones asked when 'problematizing').

Based on the results of these questions, the potential that is exhibited is sometimes genuinely exciting. MOOCs, when we first encountered them, were exciting. Learning management systems, back in the day, when they were being presented by people like Murray Goldberg at conferences like NAWeb, were exciting. Look what came out of both these technologies! Huge opportunities. Multi-million dollar industries.

Yes, the hucksters hype things. Yes, there are conferences like ISTE and publications like EdSurge that hype things, and there is no small number of academic papers published by vendors and affiliated clients that hype things.

But the issue here isn't whether or not they hype things. The issue here is whether or not they make false claims. And this is an issue that permeates all of research, not just ed tech, and it comes from all perspectives, not just from the people who promote technology. I just finished discussing the important limits on claims like 'learning styles don't exist', and to me the (well-funded?) movement behind such lines of argumentation are no different than those supporting the potential of chatbots.

What Is Being Measured?

3. Know what you are measuring

Here I am in complete agreement with Kerr. We should know what we are measuring. Even more broadly, we should know what question we are trying to answer with our research. But here's what Kerr writes:

Measuring learning outcomes is tricky, to say the least, but it’s understandable that researchers should try to focus on them. Unfortunately, ‘the vast array of literature involving learning technology evaluation makes it challenging to acquire an accurate sense of the different aspects of learning that are evaluated, and the possible approaches that can be used to evaluate them’ (Lai & Bower, 2019). Metrics such as student grades are hard to interpret, not least because of the large number of variables and the danger of many things being conflated in one score. Equally, or possibly even more, problematic, are self-reporting measures which are rarely robust. It seems that surveys are the most widely used instrument in qualitative research (Sosa Neira, et al., 2017), but these will tell us little or nothing when used for short-term interventions (see point 5 below).

What Question are We Answering?

Kerr quotes Lai & Bower:

the vast array of literature involving learning technology evaluation makes it challenging to acquire an accurate sense of the different aspects of learning that are evaluated, and the possible approaches that can be used to evaluate them

Yes, but why is that a problem? Lai and Bower themselves point to Petri & von Wangenheim, who identify 43 criteria describing how games for computing education are evaluated. A wider search may reveal more such articles. And why should we all evaluate the same aspects of learning, or the same approach to evaluate them?

But more to the point, why would we assume that we are engaged in the enterprise of evaluating learning technology. There are numerous questions we might want to answer, and they are approached very differently:

- what is there? (i.e., research as discovery - a lot of my work is here. We only need to show existence.)

- what do we need? (for various values of 'we'; I've done a bunch of this)

- what is possible? (aka the 'exciting possibilities' of the previous section. Also aka 'design research', where proving something works is all that it takes)

- what's wanted? (different from what is needed, obviously)

- what works? (for different definitions of 'works', obviously)

- what is best? (for various purposes, and for various definitions of best - most 'evaluations' properly so-called fit in here)

- what is most efficient? (a Rolls Royce might be best, but a Honda might be most efficient)

And so on. This list of questions could be expanded, and for each of these questions we could identify numerous sub-questions.

The point here is: research encompasses all of this. It makes no sense to say that there are too many different types of question and different types of measurement. Sure, maybe it would be helpful is somebody just drew up a periodic table of research methods, perhaps like this periodic table of visualization methods.

And often there's a relation between one set of questions and another. For example, if we are asking "what is best", we are (presumably) basing this on previous research that has revealed "what's wanted" or "what's needed". So when Lai and Bower write "evaluation can be characterized as 'the process by which people make judgements about value and worth'", they define what's wanted as 'value and worth', which is very hard to describe, let alone measure. So, quelle suprise that they run into multiple definitions!

What are the Indicators

When we read a statement like "metrics such as student grades are hard to interpret" what this means is that what we are measuring is an indicator of the thing being studied, and not the thing being studied itself. It's really common in science to study indicators, because in many disciplines, not just psychology, the thing being studied isn't actually observable.

It is a fixture of education research that almost everything being measured is unobservable (a direct consequence of the 'rejection of behaviourism', above). This is what creates the need for educational theory, the prime purpose of which is not (contra a lot of writing) to draw generalizations and make predictions, but rather, to (a) define these unobservable entities, and (b) to map a logic or structure allowing us to infer from the indicator to the state of the unobservable.

This is where it would help for educators to read a lot of work outside their own field, and especially work in fields ranging from business (which uses 'logic models' to infer from indicators to unobservables) to physics (which uses 'mathematical models'). Quite often the unobservables defined are conceptual, that is, they are not thought of things that exist as such, but are rather useful shorthand for things we can't actually express otherwise.

So when someone says "metrics such as student grades are hard to interpret" we have to ask, what are the underlying entities that grades are supposed to map to, and what is the theory that models that mapping?

From where I sit, a lot of educational theory is no better than the black box theories it is intended to replace, because there is an inadequate definition of the unobservable in question (whether it be knowledge, competency, or whatever) and the mapping from the observable to the unobservable. When proponents claim (as they so often do) that "my theory isn't reductive in that way" then (from where I sit) they are simply saying 'my theory is a black box'. Which is fine, but let's be clear about this.

Instruments

It has always felt a bit off for me to hear a survey described as an 'instrument', but it's very popular in the social sciences, and as Kerr observes, surveys are widely used in educational research.

One of the most significant books I've read in this area is Ackermann's Data, Instrument and Theory, which maps a lot of this out and draws out (for example) the theoretical suppositions inherent in instrumentation. A thermometer, for example, supposes that mercury expands with temperature; the model is the equation describing that relation, it is supported by a broader theory describing the relation between materials and heat.

An instrument doesn't actually measure things. An instrument is sensitive to phenomena, and is then calibrated in some way. The calibration references the model describing the relation between the instrument and whatever is being measured. The actual measurement itself is the process of taking an observation from the instrument. So we should say, for example "we recorded a thermometer temperature of 36 degrees" rather than saying ":the temperature was 36 degrees."

The utility of a survey as an instrument is therefore dependent on what it is sensitive to, and how it is calibrated. Based on observations taken from the survey, and the model we are using, we may be inferring to either mental states (hopes, desires, knowledge, etc) of the person being surveyed, or to states of events in the world (in which case the 'instrument' is actually the combination of the person and the survey form).

What I find quite funny personally is to read a person basing their theory of learning outcomes on test results, yet arguing that survey results ought to be disregarded because they are unreliable. They are both essentially the exact same instrument just calibrated slightly differently.

Sample Size

4. Ensure that the sample size is big enough to mean something

In most of the research into digital technology in education that was analysed in a literature review carried out for the Scottish government (ICF Consulting Services Ltd, 2015), there were only ‘small numbers of learners or teachers or schools’.

I totally get this complaint, and I have often seen educational research conducted on a handful of students or instructors. Sample size and representation (see below) are the two cardinal principles of statistical methodology, and it we well known that a sample of a certain size is needed for a survey ti be accurate "plus or minus three percent, 19 times out of 20".

The presumption here is that all educational research is statistical. But there are significant problems with this view.

Existentials and Exceptions

If you wish to prove that dogs exist, you need at minimum a sample size of one, specifically, one dog. A lot of research points to existential - you don't need to point to a lot of Higgs-Bosons to prove they exist, not even if your theory says there are a lot of them. You just need one.

Even more to the point, a lot of things matter even if there's just one of them, and even if their number is not (shall we say) statistically significant. One murder matters, and all we need to do is to show that it exists in order to launch a whole investigative process into operation; it would not be a very good response to say "your sample size is too small" or "the number of murders isn't really significant".

An analogous case exists when proving exceptions. If a proposition is asserted as a universal (for example, "people need competition to thrive", then this proposition is defeated even if the research shows one counterexample. A lot of the time the person uttering such a generalization retreats from universality to plurality ("well, most people need competition to thrive") but the force of a plurality is not the same, and the exceptions may prove to be significant.

Reduction to the Mean

Here is a common sort of argument: "we tested a class of students using method A and method B. using method A, the class average was 60, and using method B, the class average was 60. So there is no significant different between the two methods." Of course, there are many ways to reach an average of 60 in a class (10 people score 60; 6 people score 100 and 4 people score 0; etc).

It is true that the variation in value can be measured using standard deviation. The reduction of grades to a mean entails a theory of education where the only variables considered are at the common (teaching) end, and not the individual (learning) end. The objection here, though, is that 'test result' is not a statistical property, properly so-called. Each grade is created via an independent process, not via one process that created a scatter-plot of results. Whether an outlier is significant is not a mathematical matter, but an individual matter.

No health care intervention would ever be based purely on the results of a statistical survey. Each treatment is prefaced with a personal interview with the patient in order to identify which of many possible factors might result in this case being an outlier, because even if the outlier is statistically insignificant, a human life is not.

Now - in summary - none of this excuses most of what we see in the literature, where a sample size of twelve is used to draw a sweeping generalization about the learning population as a whole. Even were education a purely statistical science, such research would be inadequate. My point here is that there are numerous cases where research in education should not be treated as statistical research, and that possibly, no educational research should be treated as statistical research.

Short Term vs Long term

5. Privilege longitudinal studies over short-term projects

The Scottish government literature review (ICF Consulting Services Ltd, 2015), also noted that ‘most studies that attempt to measure any outcomes focus on short and medium term outcomes’. The fact that the use of a particular technology has some sort of impact over the short or medium term tells us very little of value. Unless there is very good reason to suspect the contrary, we should assume that it is a novelty effect that has been captured (Levy, 2016: 102).

In general this statement is intended to apply to statistical research, and so the discussion above also applies here. But there is also a sense in which this is really overstated.

Short Term Impact

The gist is that we learn nothing from short term impacts; "we should assume that it is a novelty effect that has been captured." But this statement is very often false.

It is true, for example, that when we exploded the first atomic bomb, there was a significant novelty impact. However it does not follow that nothing was learned, nor does it follow that long-term studies were needed to establish the basic premise of atomic weapons. I think the same is true of other interventions; we would learn very quickly that it is a bad idea to instruct English-speaking students in Urdu, that textbooks should not be distributed to students as anagrams, and that students should not be disciplined using handguns.

The same is true with positive results. If went to a typical high school French class and used an application to teach them French, with the result that they could speak conversational French after a week, this is clearly not explainable simply by novelty - or, even if it were, given that it is not possible to teach conversational French in a week by any other means, I would then try to preserve the novelty in all cases.

Long Term Outcomes

There are two ways to test for long-term outcomes. One is 'long term for an individual'. For example, we usually study how well a student learns with new technology over (say) a six week period, which coincides with the time the student is learning to use the technology. This can cause learning outcomes (whatever they are) to be a lot lower. So it's better to test how well the student learns using the tool over a long period of time (note that this also depends on the technology working properly, the student being able to learn to use it (or being instructed in using it), and the student actually using it over that period of time).

A second way to test long-term outcomes of 'long term for the tool'. After a period of time, a tool (like a word processor or LMS) is no longer novel to the teacher or the school, and so they settle into a 'more normal' pattern of usage for that tool. This contrasts with the time when the tool is 'novel' and everyone is eager to try it, and puts in the extra work that may be needed to make it work. But long term for the tool doesn't really help at all if the student is still using it for only a six week period.

Also, long term outcomes are different from short term outcomes. In education, most outcomes measured are short term outcomes - either a test, or a course grade, or perhaps a program grade. The impact of a tool is rarely measured for longer periods. More to the point, it wouldn't make sense to return to a former student after a period of years had elapsed and give them their grade 9 algebra test. A genuine long term outcome of education is something like employability, criminality, health outcomes, and the like.

Content

6. Don’t forget the content

The starting point of much edtech research is the technology, but most edtech, whether it’s a flashcard app or a full-blown Moodle course, has content. Research reports rarely give details of this content, assuming perhaps that it’s just fine, and all that’s needed is a little tech to ‘present learners with the ‘right’ content at the ‘right’ time’ (Lynch, 2017). It’s a foolish assumption. Take a random educational app from the Play Store, a random MOOC or whatever, and the chances are you’ll find it’s crap.

Content quality

I have argued elsewhere that the question of content quality is far from simple to define. So the best I would be willing to say here is that, if some technology or pedagogy is being evaluated (keeping in mind that evaluation is only one of many types of research) then it is probably best to evaluate it over a range of different quality contents, however defined. And in fact, unless it's the content itself being evaluated, then it doesn't matter what the content is.

Also, what counts as content quality, and the relative importance of content quality, really varies depending on what you are trying to do and what tools you are trying to use.

Search

I have argued many times that 'education is not a search problem'. That is, 'how do I find x' is not one of the problems we're really trying to solve in educational technology. So I agree with Kerr when he criticizes the idea that "all that’s needed is a little tech to ‘present learners with the ‘right’ content at the ‘right’ time’."

But the problem here isn't that most content is crap. First of all, it's not clear to me that most content is crap. This is an empirical claim: "Take a random educational app from the Play Store, a random MOOC or whatever, and the chances are you’ll find it’s crap." I'd love to see it substantiated for any reasonable definition of "crap". But I doubt it would be.

The reason why education is not a search problem is that 'finding and presenting' isn't really an effective pedagogy (even though it is, almost by definition, an instructivist pedagogy). And the exact content you use is rarely the point in any learning event (as I've often said, content is the McGuffin - it's the tool we use to foster learning, not what it is that should be learned).

It all comes back to that idea of what is being measured and what model you are using to connect observations with outcomes. The content used matters here only if there is a direct link between content and whatever is being measured. Many (especially people like Willingham and Kirschener) have tried to make that point. But I find that point far from proven.

Stories

7. Avoid anecdotal accounts of technology use in quasi-experiments as the basis of a ‘research article’

Control (i.e technology-free) groups may not always be possible but without them, we’re unlikely to learn much from a single study. What would, however, be extremely useful would be a large, collated collection of such action-research projects, using the same or similar technology, in a variety of settings. There is a marked absence of this kind of work.

This combines two unrelated points: first, the use of anecdotes, and second, conducting research without a control group. We also have a third point raised, the call for action-research projects. All of these, in one way or another, are stories.

I'm not always a fan of stories, not because they aren't important (they are) but because I find them boring. In my own reading I am often less interested in the details of how someone found out about something than in what they found out. However, as a practitioner of something (for example, distance cycling, or computer programming), I do find the how-to stories interesting, because they give me an insight into what it is to be a person doing that kind of thing.

But the main question here is: are stories research? I'm going to answer, unambiguously, yes.

Anecdotes

We have all read that 'the plural of anecdote is not data'. Which is a lovely statement, but which actually makes the relatively minor point that anecdotes are not (typically) valuable as input for a statistical study. And even then, it's factually wrong; were there enough anecdotes, they could serve quite well as input for a statistical survey (which is what we typically do when we put open-ended questions into our research surveys).

More importantly, though, anecdotes answer question like "is something possible?" (for example, today I read a story about a frog being found alive inside a pepper, which I wouldn't have considered a possibility before now). They answer the question, "does something exist?" Or, "is there a problem?" Or, "what do I want?"

Anecdotes help us identify the range of terminology used in a discipline, define a problem space, identify from a practitioner's eyes what might count as data, and more. No, one story does not prove that all uses of technology A will produce result B.

Control Groups

Without a control group, a scientific study becomes a story. It's only a partial picture of what happened. We don't know what might have happened instead. Sometimes that matters; often it doesn't. When it doesn't sometimes it's because we know perfectly well what would happen, and sometimes it's because all we needed to know that this is a way to make something happen.

It's also the case, as Kerr notes, that control groups are impractical in some cases. Sometimes, it's physically impossible. It would be interesting to see how well a student learns in an average home without any screens at all, but there's no such thing as an average home without any screens at all. Sometimes it's ethically impossible. Perhaps we could raise a well-educated and productive member of society with no mathematics education at all, but no research ethics board would ever sanction such an experiment.

Control groups are needed in one and only one case: when it's essential to answer the question, "what else would have happened instead?" And this question is essential in one and only one case: when we're trying to establish whether A caused B. At some point, yes, we would like to have a full causal story about education (though it would be pretty funny if it ended up being behaviourism). But education (like most of the social sciences) is a complex discipline. There are no simple causes, and statements that "A causes B" are frequently simplistic or wrong (and disproven with a single case).

Action Research

I'll rely on the account of action research provided here. It describes a model where you formulate a question, clarify theory, gather and analyze data, draw conclusions, and act on results. Thus defined it reminds a programming hack like me of the waterfall method of writing software, where everything is determined in advance. And while nobody thinks data, theory and planning are not important, there are many more dynamic ways to approach a problem space.

The main point about action research is the last - you are taking informed action. At that point, you have started interacting with your research subject. And the outcome of that process, no matter how you set it up, is the story of what happened. You are establishing a 'this happened' type of research. As I've stated above, this can be valuable for many reasons.

Kerr is suggesting "a large, collated collection of such action-research projects," which suggests he agrees that this is a basis for a statistical study (maybe even one establishing a causal relationship, or determining what practice is 'best'). But a large number of action research studies carries the same epistemological weight as a large number of anecdotes. I'm fine with that, as long as we're clear about what we're doing.

Contexts

8. Enough already of higher education contexts

It has been widely recognized in other fields, especially those with complex environments, that context matters. We're still working on that recognition in education.

Researchers typically work in universities where they have captive students who they can carry out research on. But we have a problem here. The systematic review of Lundin et al (2018), for example, found that ‘studies on flipped classrooms are dominated by studies in the higher education sector’ (besides lacking anchors in learning theory or instructional design). With some urgency, primary and secondary contexts need to be investigated in more detail, not just regarding flipped learning.

If we removed the studies of 'a class of psychology students at a mid-western university' from the literature, we would have nothing left.

OK, I exaggerate, but this is a widely recognized problem. But it's not exactly as stated here.

What is Studied

We read ‘studies on flipped classrooms are dominated by studies in the higher education sector’. What's important here is not simply the sector being studied - higher education, primary or secondary - but the fact that we're studying formal education in a classroom setting.

Much of the promise of learning technology is that it can support learning outside the classroom in an informal setting. But academic research is almost always based on classroom settings because (d'uh) that's where the students are. Except they aren't. The fortunate students able to pay tuition are there. The vast majority of humanity learns elsewhere.

And because we're studying formal education, the context of learning is fixed and assumed. There will be a curriculum. There will be a cohort. There will be assignments, exercise sand assessments. Learning will be tied to some sort of credential. There are specific facilities designed for learning into which learning technology is expected to fit. The primary purpose of this entire context is to learn.

What you find often depends on where you look. If you look in a river, you'll find fish. If you look in a classroom, you'll find classroom learning.

What is Reported

A lot of learning - both by traditional students and otherwise - is not reported. For example, it is rare to find a study of "what is learned watching Vikings" in an educational journal. Or a study of "the learning of social norms as informed by advertising". What is reported focuses almost exclusively on a certain type of learning.

To be clear, though, there is a lot of study of these other types of learning. However, the systematic reviews conducted by Lundin et al and pretty much everyone else are of studies published in academic journals. Fortunately these are more open than they used to be, so they are of more value. But such a review process overlooks a large body of literature on learning and learning technology.

It is to me a puzzle why research in education, educational technology, and learning in general has not become wide enough to encompass in a single span the breadth of corporate learning, informal learning, ambient (media-based) learning, and formal (schools and colleges) learning. From where I sit, because research is conducted in such a limited context, any generalization produced by academic research is suspect. Nobody is able to report 'what works' in a non-context-specific way.

Positivity

What would an article on educational research be without the obligatory statement, "but we must consider the other side?"

9. Be critical

Very little edtech research considers the downsides of edtech adoption. Online safety, privacy and data security are hardly peripheral issues, especially with younger learners. Ignoring them won’t make them go away.

I'm working on a paper on ethics and learning analytics with a bibliography of hundreds of sources that conclusively proves this statement is false. I don't know statistically what percentage of ed-tech literature looks at the downsides of ed-tech adoption. But I can say conclusively that it's a lot.

I get that writers want to push back against the relentless positivity found in ed-tech marketing hype. So do I. But especially in recent years, this sort of push-back has come to dominate the literature. That's not necessarily a bad thing; the push-back is healthy. But we can't claim anymore that people are just uncritically cheerleading for ed-tech. If we ever could.

Online safety, privacy and data security

There is some very good work being done by some agencies - MediaSmarts comes to mind - in promoting awareness and education about issues of online safety, privacy and data security.

As well there are some frameworks - DELICATE comes to mind - that are looking into the same issues at a more academic level.

As with a lot of such work, however, it seems to me that researchers believe that the hard problems - such as "what is ethics" - have been solved. They create and research ethical frameworks based on some unstated common understanding that we all agree that (say) "privacy is good". But a look at the wider literature shows a range of opinion. To take a simple example, I've seen people both for and against the idea of posting their children's pictures online - and no research to tell me whether or not it is actually dangerous.

Research Ethics

I think educational research haven't yet really come to grips with what research ethics actually ought to look like. That there are research ethics is a good thing, but just what constitutes research ethic.

Today there's mostly a pro-forma declaration that student data integrity will be preserved, that any possibility of harm will be identified, and that consent be voluntary and informed. These are reasonable (though my own experience is that the ethics process is both lengthy and ineffective). But they are all based on the idea that research ought not harm the subject.

By contrast, when I have spoken at medical conferences, there is a very clear requirement that authors and speakers reveal any conflicts of interests. When working with engineering groups, there is a process whereby participants reveal any patents or intellectual property that might have a bearing on the work.

Additionally, it is worth asking whether the research itself is ethical. For example, most (if not almost all) educational research seeks some way to make pedagogy or technology more effective - in other words, it is focused on quality. But is a focus on quality ethical in a world where so many people have no access to an education at all? It would be analogous to medical research focusing on making healthy people happier instead of curing diseases and epidemics.

More Research Needed

Systematic Reviews

Let me return to some remarks Kerr makes at the head of his article:

Last year saw the publication of a systematic review of research on artificial intelligence applications in higher education (Zawacki-Richter, et al., 2019) which caught my eye. The first thing that struck me about this review was that ‘out of 2656 initially identified publications for the period between 2007 and 2018, 146 articles were included for final synthesis’. In other words, only just over 5% of the research was considered worthy of inclusion.

This is a common pattern for systematic reviews. But we should avoid the conclusion that Kerr draws, that this "did not paint a very pretty picture of the current state of AIEd research."

To be clear, that's not because I think that the state of research is pretty; far from it. But this isn't evidence for that.

From where I sit, systematic reviews are based on a methodology that biases the input in favour of a specific research objective. The 'input' in this case is the 5% of papers being included. The bias is contained in the selection process. This bias is exhibited at a few levels:

- source - only papers from certain journals are selected. Often this set of journals is determined by a (commercial) indexing service. Often these are also filtered by language. Smaller, independent, and open journals are often omitted from the search results.

- subject - papers are selected using search strings or similar search processes, which means that papers outside a disciplinary tradition (that may use a different set of words) are not selected. This reflects a presumption that there is a single literary canon and common vocabulary in the discipline. Word selection can also include more direct biases. For example, including the term 'cognitive load' may overselect papers from a certain instructivist perspective.

- selection - papers selected out of the filtering process are often subjected to inclusion or elimination based on methodology. For example, systematic reviews often include only double-blind research studies, thus ensuring only papers addressing certain sorts of issues (as described extensively above) are included.

- weight - the selection is often deemed to reflect a statistical analysis, such that the number of papers asserting A and the number of papers asserting B are counted so that if A is asserted a lot, it is deemed to be better established (alternatively, they may weigh papers by the number of subjects in each study, thus biasing the results in favour of well-funded research studies).

I sometimes find systematic reviews interesting from the perspective of answering 'what there is' questions, but I would be very hesitant to accept them as currently conducted as any sort of statement of findings in the field.

The Theoretical Basis

There is a concern (also expressed above) that ed-tech researchers do not learn from history.

the research, taken as a whole, showed a ‘weak connection to theoretical pedagogical perspectives’. This is not entirely surprising. As Bates (2019) has noted: ‘since AI tends to be developed by computer scientists, they tend to use models of learning based on how computers or computer networks work (since of course it will be a computer that has to operate the AI). As a result, such AI applications tend to adopt a very behaviourist model of learning: present / test / feedback.’ More generally, it is clear that technology adoption (and research) is being driven by technology enthusiasts, with insufficient expertise in education. The danger is that edtech developers ‘will simply ‘discover’ new ways to teach poorly and perpetuate erroneous ideas about teaching and learning’ (Lynch, 2017).

This excerpt actually makes two contrary points: first, that there is a "weak connection" to theoretical pedagogical perspectives, and second, "AI applications tend to adopt a very behaviourist model of learning," which from where I sit is a pretty strong connection to a theoretical pedagogical perspective (albeit one the author disagrees with).

What we actually have are two separate issues, the first of which I'll discuss in this section: should educational research be based in theory?

There is a sense that if research cannot be conducted unless it is based in theory; I've often seen it commented on my own work that it lacks a theoretical basis. As though this makes the things I've seen with my own eyes disappear?

As discussed above, the role of theory is to apply a model that makes it possible to interpret observable phenomena in terms of one or another set of unobservable entities (usually thoughts, beliefs, skills, competencies, motivations, or some other folk-psychological concept). In this way, a theory is often depicted as a 'lens' through which our observations enable us to perceive an underlying reality.

But what if there are no underlying unobservable entities? Then the development of theory is an exercise in myth-making, and the selection of a 'theoretical lens' an ungrounded predetermination of what you will 'find' during your investigation. If you think ghosts are what cause things that go bump in the night, then if you go out searching for things that go bump in the night, you will surely find ghosts.

What is Known

Some people - notably people like Tony Bates and John Hattie, coming from very different perspectives - argue that there is a basis on "what's known" in online and distance education, based on previous pedagogical theory and research.

I don't agree. And I don't agree because I think the previous work is wrong so much as it is relevant. To me, it feels like someone coming to me and saying that "after a century of research we now have a solid framework of what's known in phrenology." And I loom back and ask, how do you know you were even studying the right thing?

Honestly, I think most of education theory and most of "what's known" is mythology. I believe that our role as researchers is to abandon the quest for these hidden unobservables until they're absolutely needed and instead to focus on discovering and describing what's there.

From where I sit, there isn't even agreement on what learning is, let along a common framework describing how to measure it.I think learning is a change of state of neural organization (and that, in fact, what we should be measuring are things like 'changes of state in neural organization' - how they are caused, and what effects they have). This is stuff that is being researched in other fields - and dismissed as 'neurobabble' in education (not without good reason - but the emphasis on 'theory' makes education especially impervious to actual research, and especially susceptible to neurobabble).

At least in educational technology, for all its weaknesses, there is the development of an empirical basis for a lot of the terminology around not only tools and applications, but also many of the functions and processes - educational or otherwise - performed by these. In research on chatbots, for example - aside from 'exciting prospects' - we are learning about the sort of training needed to teach a neural network to interact conversationally. Now there are probably huge difference between what it takes to teach a chatbot how to speak French and what it takes to teach a person, but minimally, we will have defined in a relatively precise way what it means to 'converse in French'.

So - yeah - more research is needed. But we need to get serious about what we mean by that, and begin treating it as a science, instead of as has so often been the case, an arena for politics and demagogues to play out their conflicts.