Whisper Sandbox API

whisper.cpp is a port of OpenAI’s whisper model to C/C++. Whisper is a “general-purpose speech recognition model” trained on large datatsets of audio. A sandbox API integration would give creators lots of abilities when creating worlds.

Example Use

A creator is making a world based off of the popular spelling game Sparkle. To play the game, a word is picked, and someone starts spelling. Each player spells letter by letter, then after the word is spelled, the word “sparkle” is said, and the player next to them is out. This repeats until there is one person standing.

Whisper would be a great use for this, as it can return a letter a player says. Once converted to text, the game would be able to handle the rules of the game from there. Implementing the system like this would prevent users from having to annoyingly click 1 of 27 buttons labeled A to Z and a Sparkle button.

What the API Would Look Like

Here is a simple layout of what the functions would look like

-- This gets the instance's handler for using Whisper
local WhisperHandler = instance.GetHandler("whisper")

-- This will tell Whisper to start listening
-- The first parameter is a Player object. This will be the player to listen to.
-- The second parameter is the interval in seconds. This will tell whsiper how long to listen for before it segments the text.
-- The third parameter is the callback function for when a segmentation ends. This will contain the player and the text as two parameters.
local listenInstance = WhisperHandler.StartListening(targetPlayer, 5, onTextReceived)

function onTextReceived(player, text)
    -- Handle the text from a player
end

-- This will tell whisper to stop listening and all callbacks will end.
listenInstance.StopListening()

-- This will get all of the words that were heard by whisper while listening
-- Can be used while still listening
local allText = listenInstance.Text

Concerns

This is specifically only for use in LocalScripts. This should NEVER be allowed in LocalAvatarScripts (possibly excluding only the player wearing the avatar) considering how invasive this may be.

Additionally, even though only instances will be able to use whisper, instances may try to externally record player phrases. For example, an instance could listen to all of the players talking, record it, and then send it off to an external server for storing or processing. Something like this would be strictly prohibited, but there is not much that can done about it other than moderation if it were implemented, since all third-party calls happen on a server level; player data would be sent via. a Networked Message without the player ever knowing if it hit an external API. The only way around this is have all Text instances be a non-serializable intermediate class based on Strings with custom functions. This would prevent text from being able to be sent over the network, and still give creators power to use these features without players having to worry about their data being sent, but restrict creators in how they can manipulate the textual data (since raw Strings would not be allowed to be accessible).