Batch Appliance Release Notes

6.0.0 [2024-07-09]

New

GPU & CPU
- New Virtual Appliance architecture with support for our latest generation of Ursa GPU models
- New transcription language - Hebrew (he)
- New transcription language - Persian (fa)
- Automatic Usage Reporting is enabled by default
GPU
- GPU Ursa models - all 50 languages are now available on GPU
  - Major transcription accuracy gains
  - Major improvement in Speaker Diarization accuracy
  - Faster transcription
- Bilingual Spanish and English language pack - this enables Spanish and English to be transcribed accurately within the same file
- Audio Events: Detection of music, laughter and applause in media files now supported. Refer to documentation here to get started

Improvements

Improved transcription accuracy for English, Norwegian, Romanian, Basque, Belarusian, Estonian, Mongolian, Thai, Vietnamese, and Welsh
Enhanced transcription of disfluencies in English. The model now more accurately captures common disfluencies like "um" and "uh". This change makes our ASR even more accurate for verbatim transcription, great for use cases such as audio editing, analytics on hesitations for call centers and legal transcription. For details on how to identify disfluencies in output, see the documentation here
More accurate transcription of short utterances of the word "I" in English
More accurate transcription of acronyms in English
Improvements to capitalization for English transcription
Improved accuracy when transcribing audio with periods of silence
Channel Diarization now supports up to 100 separate input channels

Fixes

Fixes for specific transcription accuracy issues in English, German, Swedish and Norwegian
Fix for issue affecting recognition of English words ending in 'erm'
Fixed an error with custom dictionary when the content is only a "-"
Fix for transcribed words returned during non-speech audio when Custom Dictionary is used
Security fixes

4.3.0

Improvements

Language vocabulary improvements for French (fr), Italian (it), Hindi (hi), and Korean (ko)
Remodelled German (de) language pack to utilize subwords, separating words into smaller segments to reduce Word Error Rate (WER)
Improved numeral formatting in English
- Improved formatting for common telephone numbers, measurements, websites, email addresses and credit cards
- Alphanumerics now have upper-case letters
- Added regional handling for en-AU and en-US Output Locale to keep 'pounds' as words
- A number of other improvements and fixes for better readability

Fixes

Fix for missing accented characters in Dutch transcription
Security fixes

4.2.0

New

14 new languages: Bashkir, Basque, Belarusian, Esperanto, Estonian, Galician, Interlingua, Marathi, Mongolian, Tamil, Thai, Uyghur, Vietnamese, and Welsh
The JSON-v2 output version is now 2.8, specific changes are:
- Additional language pack information has been added to the metadata section of the transcription results. There is now more detailed information about properties of the language being used, such as writing direction and word delimiter.
- We now also record the correct attachment direction for punctuation (e.g. before or after a space) in a new attaches_to field.

Improvements

Improved accuracy for 20 languages: Latvian (lv), Swedish (sv), Hungarian (hu), Portuguese (pt), Polish (pl), Mandarin Chinese (cmn), Arabic (ar), Dutch (nl), Slovak (sk), Bulgarian (bg), Romanian (ro), Slovenian (sl), Lithuanian (It), Croatian (hr), Malay (ms), Catalan (ca), Czech (cs), Danish (da), Greek (el), Turkish (tr)
Improved formatting of numeric entities such as dates, currencies and large numbers for Swedish (sv), Norwegian (no), and Dutch (nl).

Fixes

Fix for accurately handling "p" as "pence" when transcribing currency in English (en).
Fix for handling small denominator fractions in Italian (it) and not converting to similar English homonyms e.g. "un terzo" being converted to "1/3".

Known Limitations

The following are known issues in this release:

Issue ID	Summary	Detailed Description and Possible Workarounds
REQ-1409	Proteus HCL with `<unk>` causes out of memory error	A Custom Dictionary list that contains the word `<unk>` causes the worker to crash.
REQ-7549	Memory leak affecting gRPC	There is a small memory leak in the gRPC Python server https://github.com/grpc/grpc/issues/5913.
REQ-10160	Advanced punctuation for Spanish (es) does not contain inverted marks.	Inverted marks [ ¿ ¡ ] are not currently available for Spanish advanced punctuation.
REQ-10627	Double full stops when acronym is at the end of the sentence	If there is an acronym at the end of the sentence, then a double full stop will be output, for example: "team G.B.."
REQ-10634	Putting "-" as an item in `additional vocab` configuration will cause the container to fail	Do not enter just a "-" on its own in Custom Dictionary either as an additional vocab item or in the `sounds_like property`. Hyphens are still supported when entered as part of phrases or words
REQ-14402	When running very large numbers of small jobs (less than 10 seconds) offline, this may cause some of the jobs to be rejected	If you encounter this issue, please ensure licensing is in offline mode when running the appliance offline

Supported Platforms

Virtual Appliance image (OVA) for installation on:

VMware ESXi 6.5+ or VMware Workstation Player.
VirtualBox 5.2+
Amazon EC2

See the Installation and Admin Guide for details on the minimum specifications for the VM. The maximum number of concurrent jobs (maxworkers) that you can run on a single appliance is 30.

Form Factors

Variant	Image Size	Max. Disk Space	Languages
nano	10GB	40GB	en
mini	15GB	40GB	en, de, es
midi	30GB	60GB	en, de, es, fr, ko, ja, nl, pt
maxi	52GB	80GB	en, de, es, fr, ko, ja, nl, pt, it, da, pl, ca, hi, ru, sv
plus	61GB	80GB	en, cmn, no, ar, bg, cs, el, fi, hu, hr, lt, lv, ro, sk, sl, tr, ms, id, yue, ba, eu, be, eu, et, gl, ia, mr, mn, ta, th, ug, vi, cy

Upgrade Path

Remove the license from your old appliance (see the Admin Guide), then re-import the new OVA and configure networking as per the Installation and Admin guide. You will need to re-apply the license code you have once the OVA has imported.

Installation

Upload the OVA to VMWare ESX, VMWare Workstation Player, or VirtualBox. See the Installation and Admin Guide for more information.

Performance at Scale

Further notes on IOPS requirements under heavy usage of the appliance are now provided in the System Requirements section of the Installation Guide.

Batch Appliance Release Notes

6.0.0 [2024-07-09]​

4.3.0​

4.2.0​

Known Limitations​

Supported Platforms​

Form Factors​

Upgrade Path​

Installation​

Performance at Scale​

6.0.0 [2024-07-09]

4.3.0

4.2.0

Known Limitations

Supported Platforms

Form Factors

Upgrade Path

Installation

Performance at Scale