How to Install & Use Whisper AI Voice to Text

How to Install & Use Whisper AI Voice to Text


How to Install & Use Whisper AI Voice to Text

In this step-by-step tutorial, learn how to transcribe speech into text using OpenAI’s Whisper AI. Whisper AI is an AI speech recognition system that can transcribe and translate audio files in approximately 100 different languages.

📚 RESOURCES
- Install Python: https://www.python.org/
- Install PyTorch: https://pytorch.org/get-started/locally/
- Install Chocolatey: https://chocolatey.org/

⌚ TIMESTAMPS
00:00 Introduction
00:40 Install overview
01:00 Install Python
02:31 Install PyTorch
03:55 Install Chocolatey package manager
04:53 Install ffmpeg
05:28 Install Whisper AI
05:59 Transcribe one file
07:18 Output files
07:58 Transcribe multiple files
08:39 Available models
09:51 Transcribe in other languages
10:31 Translate to English
11:06 Help
11:40 Quality
12:04 Uninstall
12:14 Wrap up

📺 RELATED VIDEOS
- Run Whisper AI in the cloud for free using Google Colab:    • Best FREE Speech to Text AI - Whisper AI  

😢 Uninstall instructions:
- Uninstall Whisper AI
In command prompt, enter:
pip uninstall openai-whisper

- Uninstall ffmpeg
In command prompt, enter:
choco uninstall ffmpeg

- Uninstall Chocolatey
In File Explorer, delete the folder:
“C:\ProgramData\chocolatey”

- Uninstall PyTorch
In Command Prompt, enter:
Pip3 uninstall torch torchvision torchaudio

- Uninstall Python
Go to Installed Apps in Windows Settings, search for Python and Python Launcher, click the three dots, and then uninstall.

📩 NEWSLETTER
- Get the latest high-quality tutorial and tips and tricks videos emailed to your inbox each week: https://kevinstratvert.com/newsletter/

🔽 CONNECT WITH ME
- Official web site: http://www.kevinstratvert.com
- LinkedIn:   / kevinstratvert  
- Discord: https://bit.ly/KevinStratvertDiscord
- Twitter:   / kevstrat  
- Facebook:   / kevin-stratvert-101912218227818  
- TikTok:   / kevinstratvert  
- Instagram:   / kevinstratvert  

🎒 MY COURSES
- Go from Excel novice to data analysis ninja in just 2 hours: https://kevinstratvert.thinkific.com/

🙏 REQUEST VIDEOS
https://forms.gle/BDrTNUoxheEoMLGt5

🔔 SUBSCRIBE ON YOUTUBE
https://www.youtube.com/user/kevlers?…

🙌 SUPPORT THE CHANNEL
- Hit the THANKS button in any video!
- Amazon affiliate link: https://amzn.to/3kCP2yz (Purchasing through this link gives me a small commission to support videos on this channel — the price to you is the same)

#stratvert #whisperai #openai


Content

0 -> Hi everyone, Kevin here. Today, we're going to  look at how you can both install and also use  
5.16 -> OpenAI's Whisper AI. With Whisper,  you can transcribe speech to text.  
12.12 -> It has extremely high quality. In fact, you  can click on the caption icon down below on  
17.22 -> this video to see captions generated by Whisper.  It works with over 96 different languages and my  
24.54 -> favorite part. It's completely free to use. If you  would prefer not to install anything on your PC,  
30.48 -> check out the video right up above, and that  shows you how you can use Whisper AI entirely  
34.8 -> in the cloud. In this video, we're going to  install it on your PC. Let's check this out.  
40.08 -> To get Whisper AI working on your computer,  we need to install five different items,  
45.36 -> and I know that sounds like a lot, but we'll walk  through step by step how you install all of them.  
51.36 -> Also at the very end, if you no longer have a  need for a Whisper AI for transcribing audio,  
56.58 -> I’ll also walk you through how you can uninstall  all of this. First, we need to download something  
62.46 -> called Python. You can click on the card up  above or the link down below in the description.  
67.8 -> Python is the programming language that  Whisper AI uses. On the Python homepage,  
73.14 -> click on the text that says download and on the  download page, you have a few different versions.  
78.48 -> Whisper AI works from version 3.7 all the way  up to 3.10. It currently does not work on 3.11.  
87.06 -> If I scroll down a little bit, here we see  all of the different release versions. I'll  
90.66 -> click on 3.10.10, and on this page, if we scroll  all the way to the bottom, here you can choose  
96.42 -> your operating system. I'm running a Windows  machine, so I'll select the Windows installer  
100.68 -> 64 bit. Once you finish downloading Python, in  your downloads folder, click on the exe file.  
106.98 -> This kicks off the installation process, and  there's one thing you have to make sure to check.  
111.48 -> Down at the very bottom, you'll see this checkbox  that says add python.exe to path. Check this box.  
117.84 -> This allows us to run Python directly from  the command prompt, and we're going to do that  
123.3 -> later. So make sure to check that. Next, click on  install now and run through the install. And just  
128.88 -> like that, it looks like the setup was successful.  To confirm the installation, go down to the search  
134.52 -> icon down below on the taskbar and type in CMD for  command prompt. This opens up the command prompt  
140.58 -> and you can type in python -V, V for version, and  when you hit enter, here it tells me that I have  
147.42 -> Python 3.10.10 installed, and that's exactly what  I expected. Next, we need to install something  
153.78 -> called PyTorch. You can click on the card up above  or the link down below in the description. PyTorch  
160.38 -> is a machine learning library. Here on the  homepage, if we scroll down just a little bit,  
165 -> we see a section that says start locally.  Basically, we want to run this on our computer.  
170.58 -> Right down below, we have to configure a few  different settings. Right here we want to install  
174.66 -> the current stable version, so I'll make sure  this is selected. Right here, you can choose your  
180 -> operating system. It works on Linux, Mac, and also  Windows. I'll select Windows. Right down here,  
185.1 -> we have to choose the package type and I'll  select PIP since we just installed Python.  
190.08 -> For the language, we'll use Python. And right  down here, we can choose the compute platform.  
194.76 -> If you have a high-end graphics card in your  computer, like let's say an Nvidia graphics card,  
200.04 -> I would recommend choosing CUDA 11.8. That's the  most recent version. Over on the right-hand side,  
205.5 -> if you don't have a high-powered GPU in  your computer, then you could select CPU,  
209.82 -> but this doesn't go as quickly as a dedicated  graphics card. So ideally you could select this  
214.74 -> option. Once you make all these selections here,  let's copy this command down below. I’ll press  
220.38 -> control C. Back in command prompt, you can press  control V or your right mouse button, and that  
226.62 -> will paste the command that we just copied. To  install PyTorch, simply press enter now. And it  
232.62 -> looks like it has now successfully completed  installing. We're on number three now. See,  
237.72 -> we're making some really good progress. Here  we need to download a package manager called  
243.24 -> Chocolatey, and this will work on a Windows. If  you're running Mac, I recommend downloading and  
248.34 -> installing something called Homebrew. In the top  right-hand corner, let's click on the text that  
253.14 -> says install. This drops us onto the install  page, and right down here, we need to choose  
258.66 -> how to install Chocolatey, Here, I'll select  individual. If we go down a little bit more,  
263.94 -> you'll see a text box. Let's click into this  and then select copy. On your Windows desktop,  
269.76 -> go down to the search icon and type in  PowerShell. Here, we see PowerShell as  
275.94 -> the best match. Right click on that and then  select run as administrator. This now opens up  
282.54 -> PowerShell. You can press control V or your right  mouse button, and that pastes in the command that  
287.82 -> we just copied from Chocolatey. Here, press enter,  and this will go through and install Chocolatey.  
293.76 -> Now that we've finished installing Chocolatey,  we're going to use the Chocolatey package manager  
298.56 -> to install something called FFMPEG, and we're  going to use FFMPEG to read the different audio  
306.18 -> files, so whether it's a WAV file or whether  it's an MP3. Down below within PowerShell,  
311.52 -> type in choco, this is using Chocolatey, then  type in install, and we want to install FFMPEG,  
319.14 -> then hit enter. This will now install the package.  Here I'll click on yes. And here it looks like  
325.8 -> FFMPEG was successfully installed. Here I am  now in command prompt in administrator mode,  
332.16 -> and this brings us to the fifth and final  item to install. And that's Whisper AI.  
339.18 -> To install it, type in pip install, and here I'll  type in a -U. That way, if for whatever reason  
346.86 -> you already have Whisper on your computer,  that will upgrade it to the latest version.  
351.06 -> Next type in OpenAI-Whisper, and then hit enter.  This will now go through and install Whisper AI.  
359.04 -> Congratulations. We have now finished installing  all of the prerequisites to run Whisper AI.  
365.82 -> Next, navigate to the folder that has all of  your different audio files. This will work with  
371.16 -> WAV files, MP3, MP4, all types of audio and also  video files. Within File Explorer, click into the  
378.72 -> address field right up here and then type in CMD  and press enter. This opens up command prompt,  
384.12 -> and we're currently in the same directory that  all of our audio files are in. So that's perfect,  
389.28 -> and we're now ready to finally run Whisper.  To run Whisper, simply type in whisper space,  
395.52 -> and then type in the file name. Here I'll type  in sampleaudio1.wav. If your file say has spaces  
402.54 -> in it over, here you can put quotes around it.  So here I could also type in sampleaudio1.wav,  
407.4 -> close my quotes, and that will also work. And  that's all you need to do. Let's now hit enter,  
413.34 -> and this will start running Whisper AI. By  default, this will use the small model and later  
420.3 -> on, I'll show you how you can use other models.  One of the really neat things is here you see that  
424.74 -> it automatically detects the language used in the  file and here it's successfully identified that  
429.84 -> I used English and right down below, I can see  all the different text that makes up this file,  
435.42 -> so it looks like it has successfully transcribed  the file. Let's now minimize command prompt,  
440.58 -> and this brings me back into File Explorer and  you'll probably notice that we have several new  
446.1 -> files here and they're all different file formats,  but they all include the transcript. Here for  
451.5 -> instance, I can click into the JSON file and here  I see a JSON file, and here we have all of the  
456.9 -> transcribed text. Especially if you want to pull  your text in paragraph format, this is a really  
461.58 -> good way to do that here. I can click into the SRT  file, and this is a caption file that includes a  
467.46 -> transcript of everything that was said, along  with time stamps up above. Here you have some  
472.44 -> additional caption formats and here you also have  a TXT file, and here we just see the pure text  
477.48 -> without any timestamps at all. Let's now go back  into command prompt to see how we can transcribe  
483.42 -> multiple files at once. Just like we did before,  let's type in Whisper and I'll type in one of my  
489.18 -> file names, SampleAudio1.wav here by simply insert  the space. And then I can type in another file  
495.72 -> name, here SampleAudio2.wav. And now I could press  enter and it'll go through and transcribe both  
502.56 -> audio files. This works especially well if let's  say you have a number of different files that you  
507.18 -> need to transcribe. And just like that, it has  now finished transcribing both of my files. Here,  
512.88 -> if I minimize command prompt, here I can see that  I have all of these files for each one of my audio  
517.8 -> files. That's pretty quick and easy. By default,  Whisper AI uses the small model, but you have five  
524.64 -> different models that you can choose from. In  general, the larger the model, the better the  
529.5 -> quality that you'll get, but you do need to have  a GPU that's capable of running that. Also, you'll  
536.7 -> find that the larger the model, it also tends to  take a longer time to process. And at least from  
541.92 -> what I found is there are diminishing returns as  you go larger. Next, let's look at how you can use  
547.74 -> one of these different models when you run your  transcript. Back in command prompt, to use another  
552.9 -> model, simply type in whisper, your file name dot  wave. And next let's type in a dash dash and type  
559.32 -> in model. And here you can specify the model that  you would like to use. I'll type in medium and now  
565.32 -> simply press enter and it'll use that other model.  If you haven't used that model before, first it  
571.62 -> will need to download it. And it's now finished  transcribing all of my audio. And the main thing  
577.2 -> that stands out to me is it looks like it included  some additional punctuation, here attention comma,  
582.36 -> over here it included a comma, and I didn't  get that when I just used the small model. So,  
587.22 -> you do get slightly better quality, but again, it  will take a little bit longer. Back within File  
592.92 -> Explorer, here I have a file titled German.wav,  and this is audio in German, and I would like to  
599.58 -> transcribe this. Right up above, let's again  launch command prompt. Within command prompt,  
605.16 -> just like we've been doing all along, type in  whisper and then the file name, Here I'll type in  
610.44 -> German.wav. Now I could just press enter and it'll  auto detect the language, but I can also specify  
616.68 -> the language. Here I'll type in language dash dash  language space, and then here I could specify that  
622.38 -> it's German. So, it doesn't have to auto detect  and then I could press enter. And just like that,  
627.78 -> here I have a transcript in German of all of the  audio in this file. Along with transcribing audio  
634.02 -> in different languages, you can also translate  the audio into English. Unfortunately, you cannot  
640.5 -> currently translate into any other language.  Here I simply entered in the same command that  
645.72 -> I entered in previously. Here I'll enter a dash  dash and then task and currently by default,  
652.32 -> it's set to transcribe, but here I could also  set it to translate and then I'll press enter  
657.3 -> and there I can see a translation of all of this  German text. Now it's not perfect and I'll have to  
663.36 -> go back and make some tweaks, but overall, I'd say  it's pretty solid. As we've been walking through  
668.28 -> this, I've been using lots of different arguments  like dash dash language or dash dash task, and if  
675.06 -> you'd like to see a list of all of the different  arguments that you could use with Whisper,  
678.66 -> simply type in whisper dash dash, and then help.  This one I'll spit out a list of all the different  
685.5 -> arguments that you have available to you. And  it also includes a description of what all of  
690.3 -> them do. So, for instance, you can choose where  you want to save all of your additional files,  
694.98 -> and there are lots of other settings here. So,  feel free to look through here to see what all of  
699.54 -> the different options are. On the following page,  and you'll find a link in the description, you can  
704.76 -> see all of the different languages that Whisper  AI supports. In general, the lower the number,  
709.92 -> the higher the quality that you'll see. Typically,  when I finish transcribing, I’ll listen to the  
716.22 -> audio and look at the text just to make sure that  it's accurate. Overall, it works incredibly well,  
721.08 -> but I do find that I have to go back and make  a few small tweaks here and there. If you  
725.76 -> decide that you no longer want Whisper AI on your  computer, you can uninstall it by walking through  
730.68 -> all of these different steps. You'll also find  this in the description of this video. All right,  
736.14 -> well, let me know down below in the comments,  were you successful at transcribing audio? To  
741.42 -> watch more videos like this one, please consider  subscribing and I'll see you in the next video.

Source: https://www.youtube.com/watch?v=ABFqbY_rmEk