DOCX & TXT Transcripts
💡 If you come across a docx format we don’t support reach out to support@coloop.ai and we’ll add this in.
- In order to achieve high accuracy CoLoop needs to be able to distinguish between Moderators and Participants
- These labels are used to help CoLoop understand whether to use the text segment as context or quotable evidence
How to format a file to go into CoLoop
If you’re transcript format is not supported you may need to edit it slightly to work with CoLoop. Any transcript uploaded to CoLoop must
- Contain labelled speakers
- Contain one interview per file
Speaker labels must be unique before being uploaded.
Examples of formats we support
Formats we don’t support (yet!)
if you’re files are in a format we don’t support yet you should convert them to one of the formats above. Below are some useful tips for doing this efficiently.
Using CTRL+F and Replace
You can open DOCX transcripts in Word and use the find and replace feature to edit them quickly to match out formatting guides (details on this here)
Working with Forsta Transcripts Most Forsta transcripts will work direcrtly in CoLoop. In some cases they may contain multiple timestamps in each line.
We are working on adding support for these. In the meantime the tool below will convert transcripts locally to a CoLoop compatible format.
Working with Discuss IO Transcripts
We are working on adding support for these. In the meantime the tool below will convert transcripts locally to a CoLoop compatible format.
Transcripts with bold-non-bold formatting
Some transcript providers will highlight moderator text by marking it in bold. This is challenging to edit efficiently to match out format. We have created the tool below to convert these to a CoLoop compatible format.
Overriding the formatting rules
💡 You must make sure the labels exactly match the ones shown below.
If you’re still experiencing issues processing docx transcripts as a final resort you can also fully override the formatting guides and force CoLoop to recognise specific lines as participant or moderator labels.
In order to do this follow the steps below:
- Prepend all participant names with
CoLoop::P
- Prepend all moderator names with
CoLoop::R
- Speaker labels must be on a new line preceding the text segment they apply to
Example 1
Below is an example of the original transcript and the edited transcript. When doing this you must make sure:
- The text matches up exactly, it is case sensitive so you must prepend speakers with CoLoop::P or CoLoop::R accordingly
- When doing your find and replace you look for a text string that does not appear in the middle of a text segment. Typically the easiest way to do this is to ensure you are finding examples of names which come after a newline
- Newline characters can be captured in MS Word by using the
^p
wildcard (more details here)
Original Transcript
Final Transcript
Was this page helpful?