mirror of https://github.com/xzeldon/whisper-api-server.git synced 2024-12-26 18:35:48 +00:00

Audio transcription using the OpenAI Whisper models

bindings gpu openai transcription whisper

Go to file

xzeldon 58773c5ac1 ci: worflow dispatch		2024-12-26 01:42:22 +03:00
.github/workflows	ci: worflow dispatch	2024-12-26 01:42:22 +03:00
internal	feat: big update	2024-12-26 01:35:38 +03:00
pkg/whisper	feat: big update	2024-12-26 01:35:38 +03:00
tmp	initial commit	2023-10-04 01:09:38 +03:00
.gitignore	Transcribe using file buffer straight from form	2024-12-25 23:31:16 +03:00
.goreleaser.yaml	goreleaser: disable archives	2023-10-05 22:51:13 +03:00
go.mod	feat: big update	2024-12-26 01:35:38 +03:00
go.sum	feat: big update	2024-12-26 01:35:38 +03:00
LICENSE	initial commit	2023-10-04 01:09:38 +03:00
main.go	feat: big update	2024-12-26 01:35:38 +03:00
README.md	feat: big update	2024-12-26 01:35:38 +03:00

README.md

Whisper API Server (Go)

⚠️ This project is a work in progress (WIP).

This API server enables audio transcription using the OpenAI Whisper models.

Setup

Download .exe from Releases
Just run it!

Build from source (Windows)

Prerequisites

GCC Compiler Installed in your PATH (You can get it from here)
Install Go (https://go.dev/doc/install)

Before build make sure that CGO_ENABLED env is set to 1

$env:CGO_ENABLED = "1"

you can check this with this command

go env

Also you have to have installed gcc x64 i.e. by MYSYS

Download the sources and use go build. For example, you can build using the following command:

go build -ldflags "-s -w" -o server.exe main.go

Usage example

Make a request to the server using the following command:

curl http://localhost:3000/v1/audio/transcriptions \
  -H "Content-Type: multipart/form-data" \
  -F file="@/path/to/file/audio.mp3" \

Receive a response in JSON format:

{
	"text": "Imagine the wildest idea that you've ever had, and you're curious about how it might scale to something that's a 100, a 1,000 times bigger. This is a place where you can get to do that."
}

Usage with Obsidian

Install Obsidian voice recognotion plugin
Open the plugin's settings.
Set the following values:
- API KEY: sk-1
- API URL: http://localhost:3000/v1/audio/transcriptions
- Model: whisper-1

Roadmap

Implement automatic model downloading from huggingface
Implement automatic Whisper.dll downloading from Guthub releases
Provide prebuilt binaries for Windows
Include instructions for running on Linux with Wine (likely possible).
Use flags to override the model path
Use flags to override the port

Credits

Const-me/Whisper project
goConstmeWhisper for the remarkable Go bindings for Const-me/Whisper
Georgi Gerganov for GGML models