implement decoding charsets (other than ascii and utf-8) while reading textual message parts, and improve search

message.Part now has a ReaderUTF8OrBinary() along with the existing Reader().
the new function returns a reader of decoded content. we now use it in a few
places, including search. we only support the charsets in
golang.org/x/text/encoding/ianaindex.

search has also been changed to not read the entire message in memory. instead,
we make one 8k buffer for reading and search in that, and we keep the buffer
around for all messages. saves quite some allocations when searching large
mailboxes.
This commit is contained in:
Mechiel Lukkien
2023-07-28 22:15:23 +02:00
parent a31dfc573e
commit 01adad62b2
34 changed files with 157887 additions and 64 deletions

View File

@ -26,6 +26,7 @@ import (
"time"
"github.com/mjl-/mox/mlog"
"github.com/mjl-/mox/moxio"
"github.com/mjl-/mox/moxvar"
"github.com/mjl-/mox/smtp"
)
@ -474,6 +475,14 @@ func (p *Part) Reader() io.Reader {
return p.bodyReader(p.RawReader())
}
// ReaderUTF8OrBinary returns a reader for the decode body content, transformed to
// utf-8 for known mime/iana encodings (only if they aren't us-ascii or utf-8
// already). For unknown or missing character sets/encodings, the original reader
// is returned.
func (p *Part) ReaderUTF8OrBinary() io.Reader {
return moxio.DecodeReader(p.ContentTypeParams["charset"], p.Reader())
}
func (p *Part) bodyReader(r io.Reader) io.Reader {
r = newDecoder(p.ContentTransferEncoding, r)
if p.MediaType == "TEXT" {