Last month I wrote about how CAPDM, working alongside Artificial Intelligence Ltd., had integrated ChatGPT into Moodle.
We did this to offer a learning enhancement for students who might need further information in order to help them in their understanding of key concepts. We saw this as a useful alternative to the more widespread use of generative AI to help tutors with content generation, as it puts students in charge of finding out what they don’t know or understand.
While there are many ways that this could be implemented, we chose to offer hooks
in two distinct ways:
- Within reflective activities, using the directive within the activity as the context (‘relevant background information’) for the student question (or ‘prompt’).
- On any paragraph in the textual content, using the paragraph itself as the context for the student question.
The results are quite impressive, but as with everything about Large Language Models in general, ChatGPT seems capable of making up answers that have nothing to do with reality, so there has to be some caution.
However, this article about a new offering from Pearson offered further hope for our integration.
Now, we don’t know exactly what Pearson are offering but can assume that it goes some way to ensuring that responses from ChatGPT are more relevant by virtue of being guided by relevance details within their vast content domain. In order to get relevant answers to the question asked, it is useful – if not necessary – to include relevant extracts from the content domain in the overall prompt (question and context) sent to ChatGPT.
How best to do this? An efficient way of finding the most relevant parts of the content domain to include is through ‘embeddings’ – an encoding of a chunk of text into a ‘vector’ that describes a point in a multi-dimensional space. Textual chunks with similar meanings are close to each other in this space, while chunks that differ in meaning are farther apart.
Content Domains and Collections
Inspired by Pearson’s offering, CAPDM and Artificial Intelligence Ltd. set up a vector database (Milvus in our case) and created a series of embedded vector ‘collections’. CAPDM masters and manages all its client content in DocBook XML which is held in an XML database (eXist-DB). CAPDM manages a number of content domains. For example, CAPDM manages client content for various Masters programmes (generally including a complete set of publisher texts from major publishers licenced by the client), a domain of OpenStax OER texts, and a domain containing full set of SQA HNDs in Business, Leadership, Health & Fitness, amongst others.
The decision was taken to house all content from a single ‘programme’ (e.g. an MBA) in a corresponding collection. We are open to the potential that there may be advantages to restricting a collection to the content of a single course (e.g. economics) within a programme as this naturally corresponds to the course in Moodle that we are integrating ChatGPT into.
A Milvus collection is built by dividing the complete text into small chunks (e.g. a couple of sentences) to be used by ChatGPT and creating an embedding vector for each chunk in the database. The eXist-DB was mined to extract the content domains for the creation of thousands of embeddings in Milvus collections.
When asking a question of ChatGPT, the question is similarly converted to an embedding vector then used to search the closest matching chunks from the Milvus database. These chunks are then added to the question to form a ‘prompt’ for ChatGPT.
ChatGPT (and other LLMs) has a limited ‘context size’ for the prompt (question and relevant background information) and the answer – around the 4096 token mark or about 3,000 words.
Integrating calls to ChatGPT in Moodle was surprisingly simple, but made even more so by CAPDM’s approach to semantic content:
- CAPDM ‘engineer’ all course pages; they do no hand building. The semantically rich XML is interpreted to match the needs of the learning design/online pedagogy and all online materials generated in a 100% automated process. This allows for scalability, absolute repeatability and a guaranteed quality. Adding code to include hooks to ChatGPT is a minor task.
- CAPDM uses its own tried and tested display module, very similar the OU ‘oucontent’ Moodle module, giving full control over added, custom functionality – including adding the hooks to ChatGPT.
So, within a course page in Moodle a student can make an enquiry directly of ChatGTP in a number of ways, as suggested above, and from any point dictated by the learning design. These currently include:
- From specific activities (such as reflective activities) using the context of the task as the relevant background information along with their specific question.
- From any paragraph in the text, using the content of the paragraph as the context.
Students can also chose to call ChatGPT via the relevant vector database collection (via a simple Python service), adding the responses from there to the overall prompt sent to ChatGPT.
Our implementation decision was to use a collection that included all the texts from the programme in question, but we could equally have been more selective and used a collection containing only the content for the course in question or even a collection for a single, specific text.
While this is currently only a proof of concept project, the implementation is complete and robust though using example content domains to prove the principle. There have been no tests with actual programmes and students to show the value of integrating generative AI tools such as LLMs. That could only come via actual client use.
Such use would help to guide decisions about the make-up of collections, for example, determining whether a broad collection for a programme is more useful than selective course or text-oriented collections. We can also be much more specific about what to include in a collection. With all content held in an XML database we could use XQueries, for example, to extract specific elements to include and/or combine – one of the many advantages of a highly structured, semantically rich XML master.
No doubt there will be very many pilots suggesting how to use generative and other AI tools, but hopefully this article will provide useful food for further thought. While we can experiment with OER and own content, no doubt publisher content will be restricted for this sort of use. The major publishers are already developing their similar offerings but they will no doubt offer a ‘one size fits all’ service, whereas CAPDM are highlighting the benefits of tailored ‘collections’ that are 100% relevant to the courses being studied. Of course,for all such use you need to prove its value to learning but these are interesting times.
Continuing in the same vein, CAPDM is currently exploring the creation of real-time random quiz generation as an option to asking questions, to offer instant testing of understanding. Watch this space.