Reflection on my MRes Studies

There have been a lot of challenges since beginning my MRes course at University of Portsmouth, even bearing in mind the advice given to me that I should make as many contingency plans as possible. However, what has been most difficult has been planning to overcome myself in the research process. In this blog post I shall outline the natures of challenges faced and overcome. It is not the case that this is some kind of quest, merely that, given the circumstances I vastly overestimated my own abilities to carry out the kind of study that I wished to undertake. What has finally coalesced is, I believe, worthwhile research but not quite the project that I had planned. Below, I outline my learning during the MRes course so far with reference to the Vitae Researcher Development Framework (RDF) (Careers Research and Advisory Centre Ltd., 2011) in bold parentheses.

The pond at Shinjuku Imperial Gardens, Tokyo in Spring 2019. Cherry blossoms are reflected in the pond.

My original proposal was for a quantitative study that relied upon an overly optimistic sample size of volunteer participants. This sample was drawn from a population at my new place of work. Because I was a new instructor in an intensive English programme, I had few free teaching periods available when my students did. Furthermore, I had not Continue reading →

Notes on Construct Validity and Measurement in Applied Linguistics

This is intended primarily as a note for myself, and is very much a work in progress, but I thought that others might benefit. Also, if anyone commented, I would benefit. With the disclaimer out of the way, I will get to the point.

Basically, we have problems

In a pre-print, Flake & Fried (2019) make the point that measurement in psychology is very difficult to do in a valid way and, even worse, check the validity because of underreporting of decision-making processes among the researchers involved. The reason this matters is that psychology and its sub-disciplines heavily influence applied linguistics/SLA.

While psychology attempts to get through its replication crisis, the main ways for it to do so seem to be pre-registered studies and greater transparency in reporting them. Flake and Fried (2019) choose to look at “Questionable Measurement Practices (QMPs)” as opposed to “Questionable Research Practices” (Banks et al., 2016; John, Loewenstein, & Prelec, 2012 in Flake & Fried, ibid)). such as HARKing (hypothesising after results known) (Kerr, 1998 in Flake & Fried, ibid) and p-hacking (manipulating data so the p-value or probability that the hypothesis is validated by the results is due to chance is made smaller) (Head et al., 2015).

They go on to differentiate as follows:

“In the presence of QMPs, all four types of validity become difficult to evaluate… Statistical conclusion validity, which QRPs have largely focused on, captures whether conclusions from a statistical analysis are correct. It is difficult to evaluate when undisclosed measurement flexibility generates multiple comparisons in a statistical test, which could be exploited to obtain a desired result (i.e., QRPs). ”

(Flake & Fried, 2019, p.6-7)

Flake and Fried (2019) state that many of the QMPs are not carried out deliberately but a major problem is the lack of transparency in decisions made in the measurement process which reduces not only replicability but also the checking of validity.

They advocate answering the questions in a checklist (Flake & Fried, 2019, p. 9) to reduce the possibility of QMPs arising.

I am quite certain that a lot of applied linguistics masters-level students and above have seen articles where there are statistics reported but it is not clear why those particular statistics were chosen. Often these are blindly followed processes of running ANOVA or ANCOVA in SPSS software. I will go out on a limb and say that these problems are ignored as being simply how things are usually done.

However, how many of us have considered our controlled variables? For example, when running studies on phonological perception, are we explicit in the ranges of volume, fundamental frequency and formant frequency? Processing for noise reduction? I know I’ve seen studies that make claims for generalizability, not just exploratory or preliminary studies that do not control these. If you are going to make these claims, I think there should be greater controls than in a study that is primarily for oneself that you are sharing because it could be informative for others. Of course declaring the decision-making process and rationale ought to be necessary in both.

There’s an awful lot of talk about how language acquisition studies in classrooms are problematic due to individual differences being confounding. One way to increase the validity and generalisability is to be explicit in the choices made regarding measurement and variable choices.


I took part in a Google Hangout hosted by Julia Strand. Some of the ideas discussed over an hour have bound to have wormed their way in and mingled with my own.


Flake, J. K., & Fried, E. I. (2019, January 17). Measurement Schmeasurement: Questionable Measurement Practices and How to Avoid Them. Retrieved Jan 20th 2019 from .

Head, M. L., Holman, L., Lanfear, R., Kahn, A. T., & Jennions, M. D. (2015). The extent and consequences of p-hacking in science. PLoS biology, 13(3), e1002106. doi:10.1371/journal.pbio.1002106  . Retrieved Feb 1st  2019.

Current professional development goals

With the start of my MRes at University of Portsmouth, one of my main goals is to improve my data handling and data analysis skills. I have very rusty and rather limited skills in using Python, which I used to build and clean a corpus for English for Specific Purposes with the open source tools from Masaryk University NLP Centre & Lexical Computing (n. d).

Continue reading →