Recently we decided to see how easy it is to automate a video-generation pipeline, to see how an AI-generated video stacks up against one made by members of our team.

As part of this, I (Dan Cranney, a Developer Advocate here at WeAreDevelopers, the person you may have seen in one of our Dev Digest roundup videos) started by cloning my voice.

Though we’d usually opt for open source software, to get started quickly and with as few steps as possible, we signed up ElevenLabs’s Starter Plan, with their JavaScript API SDK.

It should be said, ElevenLabs’ offer Instant Voice Cloning (IVC) and Professional Voice Cloning (PVC), and while we’ll certainly be re-visiting PVC when we come to make the final video, we span up a clone with the steps below to get a feel for the setup.

Set up the project

To get started, let’s create a directory for our project, initialise it as a repo and create a package.json, and install two dependencies.

mkdir voice-cloning && cd voice-cloning
git init
npm init -y
npm install @elevenlabs/elevenlabs-js dotenv

At this point, open package.json and add "type": "module" so the ES import syntax works.

Then, create a .gitignore that includes .env, node_modules/ and audio/, output/ if you don’t want these getting committed to your repo.

Finally, let’s create two folders: one named audio and the other named output.

Get an API Key

Once you’ve upgraded to the Starter plan, create an API key in the ElevenLabs dashboard.

Set your scopes using the toggles to switch Text to Speech to Access and Voices to Write.

Just note, in case you’re wondering…“Voices” is the one that lets you create a clone and not “Voice Generation”, which is for when you’re creating new and original voices.

Create your key, and drop it into your .env:

ELEVENLABS_API_KEY=sk_...

Clone with Examples

Before we start writing some real code, drop 2–3 clean voiceover clips into your audio folder. Don’t worry too much about silences, but just note each one needs to be under 11mb or you’ll get some errors later on.

Interestingly, when I ran some tests, the results I got when feeing Elevenlabs 2 or 3 was significantly more realistic than when I fed 8 of the same standard.

Write the Script

Next up, let’s create a run.mjs, which we’ll run each time we want to create new audio from a text-based script.

Let’s look at the code then walk through it…

import "dotenv/config";
import fs from "node:fs";
import path from "node:path";
import { ElevenLabsClient } from "@elevenlabs/elevenlabs-js";

const elevenlabs = new ElevenLabsClient();
const ALLOWED = [".mp3", ".wav", ".m4a", ".flac"];

// only processes real audio files — skips files like .DS_Store
const files = fs.readdirSync("./audio")
  .filter((f) => ALLOWED.includes(path.extname(f).toLowerCase()));

const voice = await elevenlabs.voices.ivc.create({
  name: "My voice (IVC test)",
  files: files.map((f) => fs.createReadStream(`./audio/${f}`)),
});

const audio = await elevenlabs.textToSpeech.convert(voice.voiceId, {
  text: "Let's see how this sounds.",
  modelId: "eleven_v3",
  outputFormat: "mp3_44100_128",
});

fs.mkdirSync("./output", { recursive: true });
const chunks = [];
for await (const chunk of audio) chunks.push(chunk);
fs.writeFileSync("./output/test.mp3", Buffer.concat(chunks));
console.log("done. generated output/test.mp3");

After initialising our ElevenLabsClient(), we optionally set our ALLOWED file types, to avoid quirky errors that can occur if we have system or hidden files in there.

Next, we create a voice based on the contents of our ./audio/ folder, with this voice then used with textToSpeech, with the value of text representing what we want our new cloned voice to say.

Finally, we will output our file to ./output/test.mp3.

So, let’s run it with:

node run.mjs

Cloning Complete!

If your script runs as it should, open output/test.mp3 and there you are — your own voice, generated from a script!

This is the first step in a series where I’ll be documenting how I build an agentic content pipeline around it, and in future pieces I’ll look at Professional Voice Cloning for higher-fidelity results, as well as open-source alternatives, too, so be sure to check back for more soon.