So, Microsoft has just recently released a new research paper in which they document their new AI, it’s called Jarvis. So essentially, what Jarvis is, it’s a system that connects these AI software that we all know and love to achieve more than one goal. You can see right here that this is the community paper that they wrote, and there’s a lot more details that I’m not going to get into, but you can see here that it says, “We introduce a collaborative system that consists of large language models as a controller and numerous expert models as the collaborative executors.” Now, I know that might be confusing, but the basic version is that all they’re doing is using ChatGPT to control many different AI models that are on the Hugging Face website. And if you don’t know what the Hugging Face website is, essentially, it’s a large collection of large language models and many different AI software that many people do use, and it’s all open-source.

So, how does this actually work in theory, and why is it so game-changing? So, you can see right here that it is broken down into four stages. Number one, we have task planning. This is where you use ChatGPT to analyze the question that the user inputs to understand exactly what they’re asking. Then you have model selection. So, of course, like I said, Hugging Face is a website that has many different AI software that can do many different things. Essentially, this is where ChatGPT must then make sure that it selects the right software for the right user input. Then, of course, we have task execution. This involves the execution of the task and then, of course, returning those results to ChatGPT. Then, of course, we have finally response generation. This is where ChatGPT is used to integrate all the prediction of all the models and just give the user the final response. Definitely something very interesting.

Now, you might be wondering what else is there about this that makes it so crazy. Well, we’re about to get into some key examples that you need to see if you’re wondering what this is like. It’s actually quite like GPT-4, but I would argue that it’s better in the sense that it does provide you with many different tasks that it can do. We all know that GPT-4 is capable of some crazy stuff, including the image analysis, which hasn’t actually been released yet, but when it does, I’m pretty sure it’s going to blow everyone’s socks off. And another thing that is really interesting about Jarvis is that it can actually access different large language models for audio, images, and it can even access the internet, which is going to bring more up-to-date responses.

Let’s take a look at some of the examples. So right here, you can see that we have the first question. It says, “Please generate an image where a girl is reading a book, and her pose is the same as the boy in the image.” Now, of course, the image is a JPEG, and then it also says, “Interestingly enough, then please describe a new image with your voice.” So, the reason this example is so interesting and so important is because there are many different things that the user has actually requested, and it means that many different large language models are needed to be used in order to get the end result.

Now you can also see in stage one where it says task planning that there are six different tasks that it identifies, which just goes to show ChatGPT’s analysis of tasks. And, of course, then on stage two, you can see that it then chooses the different models for each one. You can see that it chooses one for pose control, it uses one for object detection, and it, of course, chooses one for the image class. Then, of course, we have task execution where it decides to execute on every single task that it needs to do. And then, of course, it has response generation where it combines everything and gives the user a final response. Here we can actually see exactly what that response is. You can see that image one is the image that was given by the user to ChatGPT or Jarvis as you would now call it, and you can see right here it’s then managed to translate that into image four. And, of course, the audio that we do get. You can also see it says, “The image you gave me is of a boy.” And, of course, you can see every single model that they decided to use. Not sure why they decided to use yellow text; it is definitely hard to see, but essentially, they’re showing you exactly which models that they used and why they use these different models.

Now, later on in the video, I will be actually showing you how you can all use and access Jarvis right away, but before that, I want to show you one more very interesting example. The reason this example is so interesting is because it doesn’t just include one question, it includes a single question but many different inputs. You can see here that there are around three to four images that include a singular question, which means that this is a pretty difficult question for Jarvis. So, you can see right here that the user actually asks, “How many zebras are in these pictures?” And, of course, we know that there are four. There are three in the bottom picture, and of course, one in the last picture. Now, of course, you can see right here by the response from Jarvis, you can see that it actually managed to get this question right. It says, “Therefore, there are four zebras in these pictures. Is there anything I can help you with?” What’s really cool, it actually submits these images back with the zebras actually outlined in complete detail.

So now we’re getting to the moment that you might want to wait for: how you can actually use Jarvis for yourself. So let’s head over to the desktop to show you exactly how that’s achieved.

When you click the link in the description, you’ll be able to see Hugging GPT, and you can see right here that there are two boxes and there are two keys that you need to submit before you can actually use this. The first one that is right here is going to be your OpenAI key, and the second one that is right here is your Hugging Face token. I’m going to show you how to get both of these because last time, many people were confused.

Now, another quick tidbit: if you do believe that this link is perhaps not legitimate, you can see that right here, this is Microsoft’s official page on the verified Hugging Face tab. And you can also see that you can literally just click right here, and you can see Visual Chat GPT, which I talked about in my previous video, and you can also see Hugging GPT right here, which is also Jarvis, and you all come to the same link. So everything is honestly completely fine.

So, for the first key, in order to get this first key, all you need to do is you need to go to OpenAI’s website first. So essentially, when you want to get your first API keys, you’re going to want to navigate to platform.openai.com, and essentially what you want to do is you want to click this button right here in the top right that says “Personal.” When you click the top right button over here, it’s going to show you a bit of personal information; that’s why I haven’t clicked it. But once you click it, you just drop it down, and then it says “API Keys.” Then essentially what you need to do is you need to just generate a new key. So, you can see right here, just click “Create New Secret Key.” And once you create that secret key, you then paste it into this key right here. Now, I’ve got to name this Hugging GPT or Jarvis because that’s exactly what it’s going to be used for, and just remember to note this down because you can never see this key again, so always paste it in a notepad or a Word document or just paste it somewhere on your phone.

The next thing that you need to do in order to get your second key is you need to go to the login page, and then, of course, you need to click “Create an Account.” So just sign up for Hugging Face like this, and essentially, you’ll be presented with a new page. So, this is free to create an account; you don’t really need to verify anything, just make a free account on the Hugging Face website. Then, like before, essentially what you’ll be prompted with is a normal website, and all you want to do is go to your settings, and that’s when you’ll see “Access Tokens” on the left-hand side. You just want to click your access tokens, then you want to generate an access token. Now, once you have both of your keys, you should be ready to go. So, essentially, I’m just going to click submit on both of these, and it should be fine. So I’m just going to wait for these to go, and then I should be able to access Hugging GPT now.

This one says it’s going to take 20 seconds, and this one said it’s going to take 15 seconds, so I’m just going to wait for that to be done.

You can see, I’ve asked Microsoft’s Jarvis a very simple question. I’ve just asked them to figure out how many cars are in this image, and I’m going to see right here live if this can actually work. Now, I do want to state that sometimes the projects and software and large language models that are hosted on Hugging Face don’t always work, and sometimes they are buggy like, so if this does actually mess up, don’t be surprised. But you can see right here that we actually did get a very accurate response. It says, “There are 11 cars in this image,” and it looks really, really interesting. So you can see right here we have car one, car two, car three, four, five, six, seven, eight, nine, ten, eleven, and it managed to get it pretty, pretty perfectly. So that just goes to show how crazy this software is.

Now, this was a live demo, so you know that this isn’t just something that is screen-shotted, and it definitely means that we are definitely moving towards a increasingly more coherent, more sophisticated, and more well-networked AI that can literally work with any crazy AI that is out there. So, honestly, guys, it does feel like we are building AGI here because this is going to be something that increasingly gets stronger the more large language models that are added to Hugging Face. I mean, think about it like this, if a large language model is added to Hugging Face that enables ChatGPT or Jarvis to be able to do something even better, Jarvis can simply add that to its pool of resources and incorporate that skill set, which is going to make it even more capable of anything else. So, this is truly some groundbreaking stuff.

Now, if you remember the GPT-4 reveal, you remember that they presented this image on the left, along with this response on the right. So the user inputs, “What would happen if the strings were cut?” and GPT-4 responds with, “The balloons would fly away.” So all you need to do if you want to replicate this, I just simply clicked “Copy Image Address,” I then decided to put this into Jarvis, but it didn’t exactly seem to get this entire crest correct. You can see right here I said, “What would happen if the strings were caught in this image?” That when you click this image, you can see that this is the same one before used, and the response that it gave me was very confusing. It said, “I can tell you the result of cutting the strings in this image is the sentence ‘The kite is Dancing In The Wind’ for sight to behold.” So, it’s definitely very confusing because I think it actually thought that these balloons were kites. So, I guess perhaps maybe Jarvis isn’t up to the scratch that GPT-4 is, but then again, the experiments are something that you should definitely conduct because it’s very, very interesting.

And you can all see here how it breaks it down. So when I’ve actually given them the image, which you can see right there, that’s the image link, you can see that it then describes the image right here as a large, colorful kite flying in the sky. So that’s how it systematically breaks down each part of the user’s request and then moves on to the next step. Let me know what you think of this new software and if you’re going to be using it or you’re just simply going to be waiting for GPT-4 to be released or you’re excited about the AI development.

Privacy Policy | Privacy Policy

Seraphinite AcceleratorOptimized by Seraphinite Accelerator
Turns on site high speed to be attractive for people and search engines.