HuggingFace Transformer.js
Finally, I am able to talk about the huggingface transformer.js project. This project uses ONNX to make it possible to run HuggingFace models in the browser.
Just calling the API is boring, there is a react translation tutorial which is the best way to get started.
Download the model: in the model, they used a worker.js to download the model. the progress_callback is used to show the progress of the download.
this.instance ??= pipeline(this.task, this.model, { progress_callback });
The pipeline
function is used to create a pipeline for the model. You can add parameter dtype: "fp32"
for higher precision, but the file will be bigger.
device: "webgpu"
is used to run the model on the GPU.
the progress_callback
will return the progress event. The structure is like this.
{
status: "initiate" | "progress" | "done" | "ready",
data: {
file: string, // file name
progress: number, // progress percentage
...
}
}
The following code shows how to stream the result. If you are using typescript, you may get errors, maybe the type definition is wrong.
// Listen for messages from the main thread
self.addEventListener('message', async (event) => {
// Retrieve the translation pipeline. When called for the first time,
// this will load the pipeline and save it for future use.
const translator = await MyTranslationPipeline.getInstance(x => {
// We also add a progress callback to the pipeline so that we can
// track model loading.
self.postMessage(x);
});
// Capture partial output as it streams from the pipeline
const streamer = new TextStreamer(translator.tokenizer, {
skip_prompt: true,
skip_special_tokens: true,
callback_function: function (text) {
self.postMessage({
status: 'update',
output: text
});
}
});
// Actually perform the translation
const output = await translator(event.data.text, {
tgt_lang: event.data.tgt_lang,
src_lang: event.data.src_lang,
// Allows for partial output to be captured
streamer,
});
// Send the output back to the main thread
self.postMessage({
status: 'complete',
output,
});
});
I can download the model and load locally.
import { env } from '@huggingface/transformers';
env.allowRemoteModels = false;
env.localModelPath = '/path/to/local/models/';
Most important is the pipelines. It is a list of pipelines that defines what it can do and what it cannot do.
pipeline | category | example models | input | output |
---|---|---|---|---|
TextClassificationPipeline | sentiment-analysis | Xenova/distilbert-base-uncased-finetuned-sst-2-english | ’I love transformers!’ | [{ label: ‘POSITIVE’, score: 0.999788761138916 }] |
text-classification | Xenova/toxic-bert | ’I hate you!’ | [{ label: ‘toxic’, score: 0.9593140482902527 },… ] | |
TokenClassificationPipeline | token-classification | Xenova/bert-base-NER | ’My name is Sarah and I live in London’ | [{ entity: ‘B-PER’, score: 0.9980202913284302, index: 4, word: ‘Sarah’ },{ entity: ‘B-LOC’, score: 0.9994474053382874, index: 9, word: ‘London’ }] |
QuestionAnsweringPipeline | question-answering | Xenova/distilbert-base-uncased-distilled-squad | const question = ‘Who was Jim Henson?’; const context = ‘Jim Henson was a nice puppet.’; | {answer: “a nice puppet”, score: 0.5768911502526741} |
FillMaskPipeline | fill-mask | Xenova/bert-base-cased | ’The goal of life is [MASK].’ | [{ token_str: ‘survival’, score: 0.06137419492006302, token: 8115, sequence: ‘The goal of life is survival.’ }, { token_str: ‘love’, score: 0.03902450203895569, token: 1567, sequence: ‘The goal of life is love.’ }, { token_str: ‘happiness’, score: 0.03253183513879776, token: 9266, sequence: ‘The goal of life is happiness.’ }, { token_str: ‘freedom’, score: 0.018736306577920914, token: 4438, sequence: ‘The goal of life is freedom.’ }, { token_str: ‘life’, score: 0.01859794743359089, token: 1297, sequence: ‘The goal of life is life.’ }] |
Text2TextGenerationPipeline | text2text-generation | Xenova/LaMini-Flan-T5-783M | ’how can I become more healthy?’, {max_new_tokens: 100} | [{ generated_text: “To become more healthy, you can: 1. Eat a balanced diet with plenty of fruits, vegetables, whole grains, lean proteins, and healthy fats. 2. Stay hydrated by drinking plenty of water. 3. Get enough sleep and manage stress levels. 4. Avoid smoking and excessive alcohol consumption. 5. Regularly exercise and maintain a healthy weight. 6. Practice good hygiene and sanitation. 7. Seek medical attention if you experience any health issues.” }] |
SummarizationPipeline | summarization | Xenova/distilbart-cnn-6-6 | text, {max_new_tokens: 100} | [{ summary_text: ’ The Eiffel Tower is about the same height as an 81-storey building and the tallest structure in Paris. It is the second tallest free-standing structure in France after the Millau Viaduct.’ }] |
TranslationPipeline | translation | Xenova/nllb-200-distilled-600M | ’जीवन एक चॉकलेट बॉक्स की तरह है।’, {src_lang: ‘hin_Deva’, tgt_lang: ‘fra_Latn’ } | [{ translation_text: ‘La vie est comme une boîte à chocolat.’ }] |
TextGenerationPipeline | text-generation | Xenova/distilgpt2 | I enjoy walking with my cute dog, | [{ generated_text: “I enjoy walking with my cute dog, and I love to play with the other dogs.” }] |
ZeroShotClassificationPipeline | ’zero-shot-classification | Xenova/mobilebert-uncased-mnli | const text = ‘Last week I upgraded my iOS version and ever since then my phone has been overheating whenever I use your app.’; const labels = [ ‘mobile’, ‘billing’, ‘website’, ‘account access’ ]; | {sequence: ‘Last week I upgraded my iOS version and ever since then my phone has been overheating whenever I use your app.’, labels: [ ‘mobile’, ‘website’, ‘billing’, ‘account access’ ], scores: [ 0.5562091040482018, 0.1843621307860853, 0.13942646639336376, 0.12000229877234923 ]} |
FeatureExtractionPipeline | feature-extraction | Xenova/bert-base-uncased | This is a simple test. | Tensor {type: ‘float32’, data: Float32Array [0.05939924716949463, 0.021655935794115067, …], dims: [1, 8, 768] } |
ImageFeatureExtractionPipeline | image-feature-extraction | Xenova/vit-base-patch16-224-in21k | image_url | Tensor { dims: [ 1, 197, 768 ], type: ‘float32’, data: Float32Array(151296) [ … ], size: 151296 } |
AudioClassificationPipeline | audio-classification | ’Xenova/wav2vec2-large-xlsr-53-gender-recognition-librispeech’ | audio_url | [{ label: ‘male’, score: 0.9981542229652405 }, { label: ‘female’, score: 0.001845747814513743 }] |
ZeroShotAudioClassificationPipeline | zero-shot-audio-classification | Xenova/clap-htsat-unfused | audio_url, { candidate_labels: [‘dog’, ‘cat’, ‘bird’] } | [{ score: 0.9993992447853088, label: ‘dog’ },{ score: 0.0006007603369653225, label: ‘vaccum cleaner’ }] |
AutomaticSpeechRecognitionPipeline | automatic-speech-recognition | Xenova/whisper-tiny.en | audio_url | { text: ‘Hello, how are you?’ } |
ImageToTextPipeline | image-to-text | Xenova/blip2-opt-2.7b | image_url | [{ generated_text: ‘A person is sitting on a bench in a park.’ }] |
ImageClassificationPipeline | image-classification | ’Xenova/vit-base-patch16-224 | image_url | [{ label: ‘n02123045 tabby, tabby cat’, score: 0.9999998807907104 }] |
ImageSegmentationPipeline | image-segmentation | Xenova/detr-resnet-50-panoptic | image_url | [{ label: ‘cat’, score: 0.9999998807907104, mask: RawImage { … } }] |
BackgroundRemovalPipeline | background-removal | ’Xenova/modnet’ | image_url | [RawImage { data: Uint8ClampedArray(648000) [ … ], width: 360, height: 450, channels: 4 }] |
ZeroShotImageClassificationPipeline | zero-shot-image-classification | Xenova/clip-vit-base-patch32. | url, [‘tiger’, ‘horse’, ‘dog’] | [{ label: ‘tiger’, score: 0.9999998807907104 }, { label: ‘horse’, score: 0.00000011920928955078125 }, { label: ‘dog’, score: 0.00000011920928955078125 }] |
ObjectDetectionPipeline | object-detection | Xenova/detr-resnet-50 | img, { threshold: 0.9 } | [{score: 0.9976370930671692, label: “remote”, box: { xmin: 31, ymin: 68, xmax: 190, ymax: 118 }},… ] |
ZeroShotObjectDetectionPipeline | Zero-shot object detection | Xenova/owlvit-base-patch32 | img, { candidate_labels: [‘remote’, ‘cat’, ‘dog’] } | [{ score: 0.9999998807907104, label: ‘remote’, box: { xmin: 31, ymin: 68, xmax: 190, ymax: 118 } }, … ] |
DocumentQuestionAnsweringPipeline | document-question-answering | image, ‘What is the invoice number?’ | [{ answer: ‘us-001’ }] | |
TextToAudioPipeline | text-to-speech | Xenova/speecht5_tts | ’Hello, my dog is cute’, { speaker_embeddings: ‘https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/speaker_embeddings.bin’ } | RawAudio { audio: Float32Array(26112) [-0.00005657337896991521, 0.00020583874720614403, …],sampling_rate: 16000} |
- ImageToImagePipeline: enlarger an image
- DepthEstimationPipeline: make depth graph