by Christoph Raetzsch
During the summer term of 2016, Christoph Raetzsch taught a bachelor seminar on “qualitative methods for digital data and online archives”. The seminar was based on a grounded-theory approach to understanding data and its relevance in online communication. Students were assigned to groups working with specific data formats, e.g. text data, image data, social media data, web archives. They had to set a subject they wanted to investigate and collect relevant data using free tools or routines that were introduced in the seminar. Two groups worked with data from Twitter and Facebook, that was collected and coded in Discovertext (kindly supported by Stuart Shulman). While all groups could eventually collect and code data relevant for their subjects, the obstacles to effectively automating tasks proofed more difficult than expected. Instead of developing routines for automation, copy-pasting data from the web is still a dominant way of compiling data into tables. Because this procedure and its results are readily understood by students, it is comparatively more difficult to teach automation procedures that cover a broad range of input-output formats and probably also software applications. Some groups, however, could rely on programming expertise among their members and developed their own parsers and filters to scrape, extract and structure data.
As a central lesson of this class, we see a great need for developing procedures of data retrieval and analysis that are not dependent on individual software solutions and that can be readily understood by non-expert users. Creating a best-practice manual or collection of procedures to obtaining and structuring data will be a core task for 2017. This manual should be developed in a seminar context together with students.