March/April 2017
Volume 23, Number 3/4
Workshop Report: CAQDAS Projects and Digital Repositories' Best Practices

Sebastian Karcher, Syracuse University
skarcher [at]

Christiane Pagé, Syracuse University
cmpage [at]



An increasing number of qualitative researchers are relying on dedicated software for the analysis of qualitative data, often referred to as CAQDAS — computer-assisted qualitative data analysis — applications. These applications allow users to annotate, analyze, and visualize qualitative data. To understand and work towards solutions to the challenges of sharing CAQDAS data, the Qualitative Data Repository (QDR) convoked a one-day workshop with CAQDAS developers, practitioners, and repository specialists, on October 28, 2016. This report describes the workshop sessions, participants' areas of interest, and future challenges.

Keywords: Computer-Assisted Qualitative Data Analysis Software, CAQDAS, Data Sharing, Digital Repositories



Along with the digitization of much of the research process, more and more qualitative researchers are relying on dedicated software for the analysis of qualitative data, often referred to as CAQDAS — computer-assisted qualitative data analysis — applications. These applications allow users to link, tag/code, annotate/produce memos about, analyze, and visualize qualitative data. The earliest such applications, such as atlas.ti, date back to the late 1980s, and today CAQDAS comprises a wide field of more than 30 applications: proprietary and open source, free and for sale, cloud or local ("Computer-Assisted Qualitative Data Analysis Software", 2016, provides an overview). The output of this software constitutes rich and valuable research data. Across scientific disciplines, stronger norms for transparency of research procedures and data are emerging. Many journals, grant funders, and even some scholarly societies are now encouraging or requiring the sharing of research data to the extent ethically and legally permissible. To understand and work towards solutions to the challenges of sharing CAQDAS data, the Qualitative Data Repository (QDR) convoked a one-day workshop with CAQDAS developers, practitioners, and repository specialists on October 28, 2016.

Louise Corti of the UK Data Services pointed out in her presentation that in spite of a growing trend of sharing qualitative research data, there is relatively little sharing of CAQDAS projects and little demand for the fully coded data that UK Data will provide for existing projects on request. This is in spite of a longstanding interest in sharing CAQDAS data. As early as 2000, Thomas Muhr (the creator of atlas.ti) promoted the use of extensible markup language (XML) to increase the reusability of CAQDAS data (Muhr, 2000). Some years later, UK Data developed and promoted the QuDex format for the exchange of CAQDAS data (Corti and Gregory, 2011). This means that CAQDAS data are not just rarely shared, they are also often not shareable: users of different software cannot open each other's data files, and repositories such as UK Data or QDR are unable to store them in formats suitable for digital preservation.

Sharing CAQDAS data thus presents a twofold problem. On the one hand, it requires technical solutions such as an interchange and preservation format. On the other hand, it requires sociological solutions. Researchers need to be willing to share their data, and they need to be aware of what sharing of CAQDAS data requires.

The workshop took place as two parallel developments promise to change this. In the social sciences, robust norms of data sharing and transparency are emerging and, increasingly, compliance with such norms is a requirement for funding and publication (Nosek et al., 2015; Lupia and Elman, 2014). Among CAQDAS software developers, greater interest in data interchange led to the emergence of a core group of developers working on an interoperable XML format, organized by the Netherlands Association for Qualitative Research (KWALON). The QDR workshop brought together a group of 20 CAQDAS developers (including many of those involved in the KWALON interoperability dialog), researchers specializing in CAQDAS, and repository specialists, representing both of these trends and their intersection.


Workshop Sessions

The workshop began with a session taking stock of the current situation. After introductory remarks by Louise Corti, who has helped build the largest collection of qualitative social science data in the world at the UK Data Services, participants considered to what extent CAQDAS data are currently shared and what the most valuable components of such data for potential reuse are. One area with significant potential for reuse is teaching. Teaching social science methods "hands-on" by having students work with published datasets is very common in quantitative social science (see e.g., King, 2006) and becoming more common in qualitative social science (see Bishop, 2012). Indeed, several of those present frequently teach with CAQDAS data especially prepared for that purpose. Making such pedagogical data (see Elman, Kapiszewski, and Kirilova (2015) for the term) more widely available would constitute a relatively easy and obvious benefit of data sharing. On the other side of the spectrum, some epistemological traditions view the usefulness of shared data with significant skepticism. Nevertheless, participants agreed that there is a large stock of data that would be valuable if shared but is not currently. Many participants also expressed the need for examples of data reuse.

If a researcher wants to share data, what guidance should they follow to produce re-usable data? How can CAQDAS software help in that process? A second session started to address these questions by establishing properties of "good" CAQDAS data. As with all data, thorough documentation is key for understanding and potentially reuse. More specific to CAQDAS applications, participants noted the ability to annotate key decisions in the research process either via memos or by annotating elements such as coding labels. Such memos could even be time-stamped to document the timing of crucial decisions in the research process (in this, they would resemble electronic lab notebooks, cf. Butler, 2005). Participants also considered the potential for automated logging, available in some tools, to facilitate these tasks. However, most of those present expressed that such data would be of little use, being at the same time too detailed (with every researcher interaction with the data cataloged) and lacking crucial information.

Both researchers and software developers expressed interest in guidance or templates for shareable CAQDAS data by data repositories. Such templates would help guide researchers to produce high-value, reusable data; they would serve as a signal for the possibility of sharing data; and they would provide repositories with CAQDAS data in a (relatively) uniformly structured and organized way.

The third session engaged with ongoing developments in creating a CAQDAS exchange format. To begin the session, Louise Corti summarized prior efforts for such a standard (QuDEX), and Jeanine Evers from KWALON described the origins and progress of the current group working on such a format. As with previous efforts, any new standard will be based on XML. The format is best suited to reflect the nested and linked elements common to much CAQDAS data. Moreover, many applications already support (non-standardized) XML output. An important task for an exchange format is the right balance between comprehensiveness and ease of adoption. Several developers worried that an overly complex format would deter adoption and thus defeat the purpose of a widely available exchange format. Moreover, functionality between different applications does not map perfectly, so a common format should focus on a core set of functions and elements. A possible way forward is to define an exchange format as both a minimal and an extended standard. For a minimal standard, an emerging consensus suggests focusing on documents, including document metadata, as well as coding. Links between documents, concepts, and cases may be more complex but fit well within the logic of XML and could be part of an extended format. Many developers highlighted the need for applied use-cases to drive development. Identifying and working with researchers interested in sharing the CAQDAS-generated data is an important way in which repositories can support the development and adoption of a CAQDAS exchange format.

In a final session, participants considered how repositories could present CAQDAS data in an attractive, value-enhancing way. Some general proposals for value-added data publication emerged. These included links to publications citing the data, ability to search by type of data analysis, and marking data for its suitability for reuse. Data repositories are working actively on all of these topics. Current initiatives include Scholarly Link Exchange for linking data and publications, continued refinements in metadata standards such as the Data Documentation Initiative, and initiatives to determine data fitness within the Research Data Alliance (RDA).

Other proposals point to taking advantage of the rich metadata within CAQDAS projects to be able to identify individual files or even assemble files from different projects within the repository. A final area considered was the various outputs of CAQDAS applications such as tables and figures. In the long-run, the ability to generate outputs at the repository level (similar to exploratory tools such as Two Ravens integration with the Dataverse software for quantitative data) would be ideal, but will likely not be feasible in the near future given the divergent tools and methods used by scholars. As a short-term solution, however, scholars could include select outputs as part of their data, especially where they are not included in the published text.


Future Work

The workshop leaves us with a full and promising agenda for sharing CAQDAS data. Key areas of future work include: collecting pedagogical CAQDAS data already used in teaching and making it widely available; actively searching out researchers interested in sharing and re-using CAQDAS data to better understand requirements and limitations of sharing such materials; development of a template for creating shareable CAQDAS data; and ongoing work on establishing a widely adopted exchange format for CAQDAS to facilitate both sharing and preservation. QDR will continue to engage those present at the workshop as well as the wider CAQDAS and repository community to advance the sharing of these rich and valuable datasets.



About the Authors

Sebastian Karcher is the Associate Director of the Qualitative Data Repository. He is an expert in scholarly workflows and an active contributor to academic open source software, in particular the Zotero reference manager and the Citation Style Language. He holds a PhD in Political Science from Northwestern University.


Christiane Pagé is the Associate Director of the Centre for Qualitative and Multi-Method Inquiry (CQMI). She heads outreach for the CQMI and the Qualitative Data Repository. She is co-author on numerous CAQDAS projects that look at leadership styles of international NGOs, of global private-public partnerships, and of U.S. Senior Executive Service members. .