Audio transcription using the OpenAI Whisper models
Go to file
Łukasz Kwiecień 31b4d8d0d4
Merge eb632c9b17 into dacad9a4f6
2024-03-14 13:02:41 +00:00
.github/workflows add mingw to workflow 2023-10-05 22:40:41 +03:00
internal remove unnecessary flags 2023-10-07 22:19:51 +03:00
pkg/whisper add method to set source language 2023-10-07 22:20:26 +03:00
tmp initial commit 2023-10-04 01:09:38 +03:00
.gitignore Deploy configuration 2023-10-05 21:34:34 +03:00
.goreleaser.yaml goreleaser: disable archives 2023-10-05 22:51:13 +03:00
LICENSE initial commit 2023-10-04 01:09:38 +03:00
README.md Fix review remarks 2024-03-14 14:02:37 +01:00
go.mod enable cors 2023-10-05 23:31:16 +03:00
go.sum enable cors 2023-10-05 23:31:16 +03:00
main.go enable cors 2023-10-05 23:31:16 +03:00

README.md

Whisper API Server (Go)

⚠️ This project is a work in progress (WIP).

This API server enables audio transcription using the OpenAI Whisper models.

Setup

  • Download .exe from Releases
  • Just run it!

Build from source

Before build make sure that CGO_ENABLED env is set to 1

$env:CGO_ENABLED = "1"

you can check this with this command

go env

Also you have to have installed gcc x64 i.e. by MYSYS

Download the sources and use go build. For example, you can build using the following command:

go build -ldflags "-s -w" -o server.exe main.go

Usage example

Make a request to the server using the following command:

curl http://localhost:3000/v1/audio/transcriptions \
  -H "Content-Type: multipart/form-data" \
  -F file="@/path/to/file/audio.mp3" \

Receive a response in JSON format:

{
  "text": "Imagine the wildest idea that you've ever had, and you're curious about how it might scale to something that's a 100, a 1,000 times bigger. This is a place where you can get to do that."
}

Usage with Obsidian

  1. Install Obsidian voice recognotion plugin
  2. Open the plugin's settings.
  3. Set the following values:
    • API KEY: sk-1
    • API URL: http://localhost:3000/v1/audio/transcriptions
    • Model: whisper-1

Roadmap

  • Implement automatic model downloading from huggingface
  • Implement automatic Whisper.dll downloading from Guthub releases
  • Provide prebuilt binaries for Windows
  • Include instructions for running on Linux with Wine (likely possible).
  • Use flags to override the model path
  • Use flags to override the model type (when downloading the model)
  • Use flags to override the port

Credits