computing – Marc Jones

Corpus Linguistics: Searching for affixes with R

One of my interests is corpus linguistics and creating corpora. However, I want to get better at analyzing my corpora more deeply.

As a project to help me learn the software/language R, I made a corpus analysis tool that gets the first 5 and last 5 characters of each word in a corpus, counts their occurrences and outputs the results in CSV files.

You’ll need to download R if you don’t have it.

The code I wrote is here.

Current professional development goals

With the start of my MRes at University of Portsmouth, one of my main goals is to improve my data handling and data analysis skills. I have very rusty and rather limited skills in using Python, which I used to build and clean a corpus for English for Specific Purposes with the open source tools from Masaryk University NLP Centre & Lexical Computing (n. d).

Continue reading →