Timofey Gelazoniya
7913e40092
Add pre build step |
||
---|---|---|
.github/workflows | ||
internal | ||
pkg/whisper | ||
tmp | ||
.gitignore | ||
.goreleaser.yaml | ||
LICENSE | ||
README.md | ||
go.mod | ||
go.sum | ||
main.go |
README.md
Whisper API Server (Go)
⚠️ This project is a work in progress (WIP).
This API server enables audio transcription using the OpenAI Whisper models.
Setup
- Download
.exe
from Releases - Just run it!
Build from source
Before build make sure that CGO_ENABLED env is set to 1
$env:CGO_ENABLED = "1"
you can check this with this command
go env
Also you have to have installed gcc x64 i.e. by MYSYS
Download the sources and use go build
.
For example, you can build using the following command:
go build -ldflags "-s -w" -o server.exe main.go
Usage example
Make a request to the server using the following command:
curl http://localhost:3000/v1/audio/transcriptions \
-H "Content-Type: multipart/form-data" \
-F file="@/path/to/file/audio.mp3" \
Receive a response in JSON format:
{
"text": "Imagine the wildest idea that you've ever had, and you're curious about how it might scale to something that's a 100, a 1,000 times bigger. This is a place where you can get to do that."
}
Usage with Obsidian
- Install Obsidian voice recognotion plugin
- Open the plugin's settings.
- Set the following values:
- API KEY:
sk-1
- API URL:
http://localhost:3000/v1/audio/transcriptions
- Model:
whisper-1
- API KEY:
Roadmap
- Implement automatic model downloading from huggingface
- Implement automatic
Whisper.dll
downloading from Guthub releases - Provide prebuilt binaries for Windows
- Include instructions for running on Linux with Wine (likely possible).
- Use flags to override the model path
- Use flags to override the model type (when downloading the model)
- Use flags to override the port
Credits
- Const-me/Whisper project
- goConstmeWhisper for the remarkable Go bindings for Const-me/Whisper
- Georgi Gerganov for GGML models