Unveiling the possibilities of AI video in L&D

Introduction

Hi, everyone. It's great to be here. And thanks again for inviting me.

Key Topics of Discussion

Eventually, today, I would like to talk about three major topics, the concept of AI video and how it changed over time, why AI video could be useful for learning and development and within the field of learning. And of course, just as the others have done so far, live demonstration where you can also ask questions.

And yeah, I would love to be as engaging as possible.

Personal Background

Some bits about me. I'm the CEO and founder of Colossian. Colossian is the AI video platform for workplace learning.

I have a background in mathematics and computer science, originally from Hungary, but studied in Denmark and Hong Kong. I was already working with generative adversarial networks, which are the initial algorithms that cover visual generative AI in 2018 back at university.

It's been so great. I think everyone who directly worked with the code back then and the technology really saw that this was going somewhere. But this hype didn't exist back then, and it was so cool to be in the field.

Nowadays, it's a bit different, right? But eventually, I'm coming from a technical background, and I always had a passion for education. Education changed my life, and my mother is a teacher, so I'm coming from a household where this was really important. And today, I'm doing the cross-sectional boat, and it's just a great experience.

About Colossian and AI Video Technology

The technology that we are developing at Colosseum is eventually being able to create avatars like real human presenters based on text. I will bring my own avatar who will speak and I will just showcase to you how this works and basically talk about the learnings that we had so far and the concept of building the company as well. 1And yeah, we'll want to ensure that you understand where this field is going and you can potentially get some inspiration from some bits.

So yeah, that's just some short information about myself.

Evolution of Video as a Medium

And yeah, when I usually talk about video itself as a medium, I feel like eventually people understand that video is more engaging so they want to use it but so far it's been like a restricted media because you needed like studios actors and equipment but with the invent of this technology you're able to create like something that we call an AI video eventually a video that's created using generative AI even parts of the video or the whole end-to-end flow of the video And I think it's just so much more superior as a workflow.

If you see some of the research that's been done, like how corporations today are using generative AI, a significantly big part of it is content creation, internal content creation. So content creation is picking up, and there is a big adoption around it. So when you also see the adoption curves, I think that's very interesting, and eventually

Historical Analysis of Video

If we look back of how video itself has evolved, what we've analyzed as a company is, of course, after the advent of the Adobe software suits and everything, there was some adoption. But what this technology is also bringing us is the ease of use, which is really important. The fact that anybody can create content.

So, these are the trends that I see personally. For example, the fact that how AI is increasing the ease of use of the content creation and the fact that people understand that, for example, video is a more superior material for certain use cases than, for example, text.

Company Focus and Learning from Startups

And overall, building the company ourselves, just to be clear about what we do exactly, is we do offer a platform for video creation, where you can create videos with avatars, which I will showcase in a bit, from text. And we started out by offering this platform, and it wasn't really focused on learning initially.

So when you're building a company, you talk about, for example, ICPs, an ideal customer persona, and it's so important to align on that. And I think one of the reasons why it took a bit longer than expected because we originally founded the business in 2020, because we couldn't focus on the right ICP early on.

So overall, we focused our attention more on the learning segment and creation of learning videos, but it took a while to get there. And it's just one major learning from me that If you are trying to build a technology that's innovative, you really need to define that segment or that group that is ready for an early adoption. In our case, the reason why we ended up, other than my personal ambitions in the L&D sector, is because of the market adoption and the continuous

Before I founded Kostya, I had another startup, which was around the cybersecurity aspect of generative AI detecting deepfakes. And that one was really tough because I'm coming from an engineering background. And we built a product for like one and a half years, which we had to trash in the end because no one was willing to pay for it. So we didn't ask the right monetization questions.

the right questions around what the product would be eventually useful for. So I would say that throughout the journey regarding where we got to be, if I had to share two learnings, these would be the ones like validating before building. It's something that I continuously disclose to the audience because primarily coming from a technical background, again, there is a bias towards building.

And just the definition of an ICP and doing this continuous market penetration testings and having interviews with the right people. I think those are really key. And this is how we managed to also evolve the concept of AI video.

Demonstrating AI Video Progress

But let's talk about the journey itself. So originally, in 2020, we had a video which looked like this. So that's going to be my avatar. That's two years ago.

You will see that, for example, it's going to talk about some text that I had as an input. and you will see the difference between the visual results. Eventually the technology itself works in a way that whatever text you input will manipulate the face of the person. And now we are getting there where you can also manipulate further parts of the body, but back in 22, only the lips was primarily possible.

So let's see whether we can play this. My name is Dominic, one of the founders of the company. As you may have guessed this is a fully AI created version of myself speaking to you in an artificial voice. Yes, so I hope that was visible.

So that's a result from two years ago. Eventually, it's important to highlight that creating such a technology requires internal resources, like mathematicians and machine learning engineers. And it's quite complex because it's not available open source. And there was a big push towards getting this to a better quality.

So I brought also a result from last year. Eventually, I'm the same sweater for some reason, but you will see. Hi there. It's great to have you here at Colossian.

My name is Dominik, one of the founders of the company, and this is my AI actor. I will give you a short brief on how to get started on creating your first video here at Colossian. So you can see some of the differences, but we are still not there. This was last year and we wanted to really wrap up like how we can make this even better because I believe that if you are building something that's for innovating like a use case or several different use cases,

you really need to pay attention to this adoption curve. And if you make, for example, results when it comes to AI more realistic, more trustworthy, then you can penetrate the market better. And... Hi there. And eventually...

Recent Advances in AI Video

new year, new result, and also new outfits. I'm Dominik, the CEO and founder of Colossian, and this video with my AI avatar and voice will guide you through getting started with our platform. Let's dive in. type your text into the script box.

To get the best AI narration, you can switch voices. So basically, this is the result for this year. It's a continuous work for four years already on the tech through the team. Eventually, it's also the clone voice, as you may have heard, a bit more Americanized accent, which I wish I would have.

Eventually, I just wanted to give you this glimpse and understanding into sometimes perfecting from a technological point of view, not just from market development point of view, a technology, it just requires so much nuances and efforts. For example, The big difference between these two or three examples were how we managed to just move from the facial region to more of the regions of the face. In the demo, I'm going to showcase to you how we can also move the hands now, the hand movements.

So it's just a one by one element of the human body control regarding what we are solving with AI and how we are aligning that to the text. in any language, by the way. It's not just in English, in 120 languages. So that's quite key.

Yes? Just before you move from the slide, is the reflection in the glasses intentional? Excuse me? The reflection, the glare?

Yes. Is it intentional when you make the ? That's a great question. Basically, the avatars that we create are real humans, like myself.

They were in a studio once, in a studio setting. And all those details depend on the studio setting. So eventually, we use that as a data recording process. If I had to explain the technology itself

really easy manner. Basically, we put someone in a studio, we make them read English tongue twisters, so we can map all their facial expressions. We use this as a data source, a data set, and then we use that to train a model, which are like four neural networks, to drive the avatar version of the face and the person. So eventually, if you have reflections on your glasses in the studio, they will also be present on the avatar.

Why AI Video Suits Learning and Development (L&D)

But let's get back to L&D, why we found that L&D was the market for this technology. What we saw was there were increased costs for the creation of video. 1After COVID, there has been huge digitization efforts for training departments, primarily at companies. required turning, for example, text-based materials or real-life classroom trainings to be digitized.

And all of these L&D and learning departments were seeking solutions. And of course, turning those into video-based is beneficial because you can keep the same engagement levels. But now with such solutions, you can also achieve similar levels of results with less resources, so like less costs, for example. And this just made it beneficial for them.

So this is also a major learning that while we were developing our technology, there was a big shift on the market. a desire to get here and to use video. And that just helped a lot with the adoption eventually. And that's really beneficial in terms of timing.

Localization and Content Chaos

And in addition to that, localization, as we found out, is so important. So if you can use a technology like this to localize videos in more than 120 languages, that's really important for multinational companies. I also read several research papers like that. Employees value the employer much better in terms of loyalty if they can consume the content in their native language.

So that's really vital to them. And what I always like to highlight is that because of all the easy content creation tools now, there is a content chaos happening also in the L&D space. And that means that there's just so much content, for example, at an enterprise company, which is outdated. And what I really like about this technology is that you can create the content that you need.

But if there is something changes, like in the process, for example, there is a new law coming out for a company like KPMG. And six months later, you have to change the content piece that you created. then you can do that change and update the video. And that just reduces content chaos because the outdated pieces won't be out there for that long.

Realism and Scenario-Based Learning

So it's really beneficial as far as we've discovered. Eventually, you can see some of the example snippets that I brought. This is how the videos look like from a picture basis. And eventually, the avatars themselves now reach the point where they are close to as realistic as real-life presenters.

And you can also create more of a scenario-based learning settings with them where they are looking at each other. I personally think that it's a bit more superior even because it just creates a more authentic and realistic setting. So that is how we are seeing it used for training purposes.

And also, as I talked about localization, translation is also important because of the capability, the fact that you can quickly and effectively translate videos. I will show it to you in the demo pretty soon. And yes, I think there is also a sample version.

Good morning. How are you today? Here are the languages you can use for voice cloning. Guten Morgen.

Wie geht es dir heute? Bonjour. Comment allez-vous aujourd'hui? Ohayo.

Kyocho genki desu ka? Zao shang hao. Ni jinten zi? I actually speak a little bit of Mandarin, but I should still practice it a bit more.

But now that you have the technology for this, it's simply wonderful that you can give a learning or a training presentation in multiple languages. And yeah, it's great, because the technology itself, it's pretty much language agnostic, because it depends on the sound waves, not like the actual phonetics. So it works with almost any language available out there. So that's a benefit that we get from the fundamental perspective.

One big trend that we see as well and what I'm learning is, of course, you've seen these some bits of these videos now you might have a feeling that okay this is still a bit uncanny and is getting there and and i i do see your point um like i i do think that even if you create like an amazing looking ai video with a presenter it's not gonna be as good as a real video because that real video has all the real elements as uh but but of course it's hard to compare when An actual video takes two to three weeks and so much money to create compared to this, which takes minutes, actually.

Enhancing Engagement with Interactivity

So we were thinking about how we can innovate the field even more to make it more engaging. And what I realized was that interactivity is so important in learning and training. So we figured out a way to also include interactivity into the experience.

So imagine a video where this avatar, like this presenter, could also talk to you. And that just creates a much better learning scenario, which is not just passive but also active. Now there is a trend in the space that people want to move from passive to active learning, where the training content actually engages with the learner. and that just has higher engagement levels and better end-to-end training output.

So that was a big learning on our end and this way we can achieve something that outperforms a traditional video. So like an AI made video with interactivity in terms of engagement does outperform like a regular traditional video.

It's really hard to create also interactive videos in a traditional video setting because if you think about like a branching scenario where the presenter is asking a question from you and depending on that you have to play a different video and you also have to record all those different snippets and it just scales the complexity of the production. So that's quite interesting in my view.

Interactive Video Scenarios

Yes? So if I understand correctly, basically what you can do now, depending on which answer you choose, I'm going to have a different video back. Yes, exactly.

For now, the creation process is manual, but my goal would be to make this automated as well, based on some prompts or pre-made knowledge base. That's more like the future, right? But even if you can create this manually in an easy way, that's still pretty valuable.

Hi, everyone. I'm AI Dominik, and I'm here to test your knowledge on AI video.

Let's start with a simple question. Do you know what can you do with AI video?

Great start. Are you ready for the next question?

So depending on your answer, different snippets of the video can play. And that can just allow you to just also learn the relevant materials based on the knowledge checkers and everything. So that's quite innovative as well in terms of the creation process based on what I see on the market.

Live Demonstration: Creating an AI Video

So yeah, live demo. Let's create a video together. Eventually, we'd love to also hear your view about it.

So this is the creation process and the flow. really like a person with like no design or skills or or even like creativity so we wanted to ensure that we can create a platform that I can also use to create a great looking video easily and also our customers and and you have like a narration box here and the PowerPoint like flow in the in the whole thing so you can put here any narration so for example welcome to the mine store a meetup or welcome to the meetup about generative AI and we can already play this and listen to the sound. Welcome to the meetup about generative AI.

And you may have heard that, for example, AI is not pronounced well. Well, you can fix the pronunciation here. We do have a setting for that. But I'm not going to mess with the phonetic spelling here.

But Genitive Artificial Intelligence. And we can also choose different avatars you may have seen. So we have multiple ones, including myself, but a variety of different people that you can choose from. So eventually you can choose almost anyone that you would like, even your custom one.

that you can create and let's stick with Nina actually because she supports the hand gestures which I mentioned before. So if you can move her and show the view. As I mentioned, where the technology got recently, you can also include hand gestures, for example, point her to the right or to the left or put an okay sign or a thumbs up. This is really where the tech is going that you can just make a more interactive experience and the way the avatar behaves on the screen and in the video.

So you can also add a thumbs up here. And yeah, I hope you will ask any questions. We just need to add some additional text after that.

And since there was a few words about OpenAI before and ChatGPT, we also have that integrated. So create a two sentence summary about how AI is changing the field of video, for example. And we can add like additional script elements there.

Insert it, yes. And in addition to this, for example, we can add a different background that you would want. For example, an office space here. So it's just a bit more, oh no, it's a kitchen.

So it's a bit more realistic. And as I may have shown before, also on the interaction. So a question like, for example, how did you enjoy the meetup? It was great.

It was excellent. So which is the answer that we want? Of course, both of them are good. So we can then ensure that whichever is selected is good and correct.

So these are some of the functionalities that we offer for video creation. But I would like to show you one of the coolest stuff here, which is if you have some text on the screen. I have the Colosseum run loaded now. Maybe we can use this one.

Sorry for a black and white thing. It's just a brand element. We can just remove these and add back, for example, myself here. And for example, when you are facing localization challenges, it's important to have the possibility of translating these.

And you can just translate that also with a click of a button here to Italian, for example. So you now have the Italian version of the video as well. And that was just done with a click of a button. So you have also the translation in there.

Yeah, so these are basically the main functionalities. I'm happy to generate the video in the meantime and send it async. But this is basically the creation flow, so it's quite easy. Yes?

So for the tech itself, it still takes five to 10 minutes. But what you will see in this space for this piece of technology is soon going to be real time. I would say I'm estimating a few quarters, one or two quarters, and we could make that real time eventually. Yes?

Yes, you can. You can. Yes.

Yes, it would but you know, there are some countermeasures so if you see some of the research coming up around like video Restoration and enhancements. I think in the next years you will be able to produce similar level of quality from the phone even yeah There was another question there Okay, yeah, I see yes The Microsoft one? Not really.

Photo avatars are a thing. And creating avatars from photo is becoming more widespread. So it's more of a restricted research. And we personally, we are trying to productize it.

Closing Remarks and Q&A

So yeah. Are you going to be around for questions after? Yes. Cool.

I just have a final video snippet, if you don't mind, about the MindStone one quickly. Thank you for coming to my talk at MindStone. I hope that you've learned something new and look forward to answering your questions. Thank you so much.