Learn AI: New AI ‘VIDEOGPT’ SHOCKS The ENTIRE INDUSTRY
After an insane week of AI releases, we have another AI release, which is the video GPT. Take a look at this. You can see right here that this is video GPT, essentially where you’re able to chat with a video. So let’s take a look at one of the first demos released by the company that released this, and then after, I’m going to show you exactly how you can use video GPT and then show you exactly its capability. So let’s take a look at OpenAI Labs, which released this video demo.
You can see right here that this is the user, and then this is, of course, the robot. So they say in this video there’s a person eating leaves, why? Okay, now this is the video on the left-hand side, which is clearly of a panda that is eating leaves, and of course, it’s not a person. So we can gauge the AI’s understanding by its response, and when it says, “The video does not contain any person eating leaves, it only contains a giant panda sitting on the ground eating bamboo,” we can clearly understand that this AI isn’t able to be tricked when it’s coming to visualizing exactly what’s in the image and identifying certain objects. So it can see that this is a panda, it’s sitting on the ground, and that it’s eating bamboo. This shows that it knows exactly what’s going on in this image. So of course, it says, “How is the panda feeling?” And this is the panda in the video. It appears to be content and relaxed as it sits on the ground eating bamboo, which is very, very advanced. I mean, some people wouldn’t understand how crazy that is because gauging emotions is definitely a high-level skill, and you do have to have a certain high level of emotional intellect to be able to understand what emotion someone else is feeling, whether or not it is a human or an animal. So showing this capability so early on in this development is definitely very, very surprising.
Then, of course, we have some more questions, where it says, “Why does the panda raise its hand?” Now, of course, this is something that I’m not sure, it says in the video, the panda appears to be raising its hand to reach for more bamboo, and that’s what we can see on the left-hand side. Can it eat out all of the bamboo? And then, of course, we have a response here. Now, of course, we have another video here where it says, “Is there anyone in the video, what gender?” And it says, “Yes, there is someone in the video. The person is a woman. Where is she? What is she doing?” The woman is on a roof, and she is doing yoga. I mean, that is, um, you know, it might seem pretty basic, but, you know, how far we’ve come in such a short space of time, this is honestly, honestly just shocking at how, um, quickly this stuff is moving.
So, um, yeah, as you know, it’s a woman on the roof doing yoga, which is, of course, what you can see right here. And then, how do you know it’s on the roof? I can tell it’s on the roof because in the background of the video, the sky is available, and there is a large green tree in the background. So that is also something as well that the AI is able to see, that it’s on the roof, that it’s not on the ground. That is a key piece of information. So I’m wondering also if this AI technology is going to be able to be translated into, you know, self-driving cars, many different applications. I mean, there’s just honestly a vast majority of ways that this can be translated. But, um, let’s continue looking at it. What is she wearing? She’s wearing a black shirt and pants and a black swimsuit, which is, of course, cool. My mother video taken, um, of course, as you guys can see. They also say that, you know, trying to get out more information in the video, the time of the video is we don’t know yet, which is, of course, true, and then it’s likely that it’s taken during the day. So it’s clear that this, you know, AI also has some deductive reasoning skills, which is going to be very, very useful. I mean, imagine you’re trying to classify a large amount of videos. Someone sent you a large amount of videos. Imagine you’re trying to watch some CCTV footage. There was a large amount of CCTV footage, and you’re trying to find something specific. You can literally get an AI and be like, “Hey, find out when there was a man in a red shirt in this footage,” and it could be able to literally identify it. So it’s crazy.
So it says, “What is the elephant doing here?” The elephant is using its font. So then, of course, we have this example, and if you don’t know about this, this is a pretty insane example of an elephant painting itself. And that and the AI just simply knows that the elephant is using its trunk to paint a picture of the baby elephant on an easel. And then it says, “Are you sure? I think it’s drawing a kettle,” and so the painting may not be perfect, but it’s actually an elephant. So this goes to show that this AI is actually not just one of those AIs that you can gaslight into thinking that, “Hey, this is an apple. This is a kettle,” because many large language models that we have used before, such as Bing and chat GPT in the early days, we could actually gaslight them, and even some models that are being released now, you know, that we can actually trick them into believing certain things which aren’t true. But what I find interesting is that this AI is able to easily identify exactly what’s in the image. So it’s very, very, very interesting, guys. This is honestly crazy.
Okay, this is us anything now. You can actually use this. Okay, so ask anything with GPT is available, and it is for free. So, um, you can actually demo this, and I’m going to show you exactly how you can use it. So essentially, just click the link in the description, then, of course, you can paste your OpenAI API key here, and I’m gonna go ahead and do that, and then I’m gonna go ahead and obviously show you guys what it’s like when you have a video. So I’ve gone ahead now, and I’ve uploaded this video, and this is, of course, the input video. So when you see this video, we’re going to ask the AI a bunch of questions. So I downloaded this video from Storyblocks. This is just simply a video of a young boy kicking a football in just a plain grass field. Okay, um, and he kicks it. Let’s just count how many times he kicks it, because I want to be able to see if the AI is able to identify how many times he kicks the ball as well. So he kicks the ball just two times, and then, of course, the video stops. So if we click “Watch it,” you’re gonna see that right here, that this is where there is the cue, and of course, this stuff is loading, and then you’re going to be able to chat with the video. So we just have to give it a moment, and of course, down here are some more examples that you can use as well. So now that this is loading, depending on how long it takes, we should get this.
Now, I’ve already done this with this video, so I know that it’s actually pretty smart, but it’s still very interesting to see. Um, that this AI can process these videos, and I’m not sure why it’s taking. I think maybe now that the tool is getting a bit more popular, that’s why we’re having these loading. So now, it is loading the video at 20%, and this is nearing its completion, so we just have to wait because it’s now understanding the video. And now it says, “Kicking soccer ball, so you uploaded a video about a young boy playing soccer in the field of grass in the sun. Click the button to chat.” Uh, let’s see what action is she doing, and let’s see if it’s able to identify that this is actually a technique called Shadow Boxing, and it says, “This woman is playing with the ball.” So I’m guessing that you’re only able to see what is going on if the footage is essentially bright and I guess performing a relatively normal action because I think how this works is that, of course, there are many different things working together, as you can see right here, it’s working together with chat GPT, uh, action recognition, visual captioning, so when you have a lot of different software’s working together, what tends to happen sometimes is that certain things, I guess, are lost as they transform from one application to another or one large language model to another. So I’m guessing that’s what happened here, but nonetheless, this is still a very, very impressive demo considering we have only recently, you know, it feels like last month we just had ChatGPT4, and now we’re getting software that’s like this.