Lip synch and speech

Hello again,

I realize the usual disclaimers hold (synfig is still early in development, lacks sound support, etc.), but I do have a question about sound and specifically speech:

What’s the intended approach for animating speech (that is, for actually moving the character’s mouth)? Is it shape tweening? Switch layers? What? And I also realize that there may be more than one way to accomplish the task, but what’s the intended method?

I’d like to take a crack at animating a little cartoon with some spoken parts. I have other software that I can use to time out the lip synch prior to animating and then knit sound and video( or image sequence) together for a final product, so I’d like to try doing the actual mouth animation in the way that the developers envision just to see how it goes. So what is that vision?

Or, if nobody has a good answer for that question, how would you animate mouth movements in synfig?



Okay, I just found this tutorial:

Sounds like this may answer my question. I’ll see if I can follow this and make it work.


I’ve now read through the above tutorial while inspecting the corresponding synfig file. The tutorial says at the end:

I can’t find where/how that last bit (starting at frame 6) is accomplished. If it’s something I read in an earlier tutorial, then I’ve forgotten it and can’t find the reference again. :blush: I’m sure it’s something basic, but if somebody could point me in the right direction, I’d appreciate it.


When you create a New file, from the New button, the New canvas dialog pops up. (the wiki version is out of date and now is a little more organized). You can change there the start time.

Thanks for the reply, but… …hmmm…

Both canvasses seem to have start times of zero in the example file I downloaded. That’s why I didn’t think this was where that was set. Or am I still not looking in the right place?


Back to the original question, it looks like you could do mouth animation this way:

  1. Create each mouth shape/drawing (composed of perhaps multiple shapes) and encapsulate each mouth drawing.

  2. Encapsulate the entire collection of encapsulated mouth drawings and export the resulting canvas.

  3. Within the exported canvas, set up keyframes where only one mouth shape at a time is visible in each keyframe (by setting the “Amount” for the unwanted mouthshapes to zero).

  4. Duplicate these keyframes as needed to match the lip synch timing.

Does that seem about right? It’s late here, so I plan to give this a try during the day tomorrow.


It is correct. The sample files start time is not correctly set. I meant to set it on the final file to avoid all the possible stored keyframes on the real animation. So you should set the main canvas to have a start time further than the last stored keyframe of the child canvas. In this way the stored keyframes are only show when you want.

That recipe can work but you could not have smooth interpolation between keyframes because between each one there is a canvas change.

Instead of that I recommend to use only a single exported canvas and set the keyframes as the result of morphing the ducks (or the parameters like Z depth or what ever) of all the layers that compose the exported canvas. Then you should obtain a smooth mouth change between keyframes of the phonemes.
If you need some Z order trick (for example the tongue is over the lower lips when the “FV” phoneme is pronounced) you can modify the Z depth too.


Actually, I’m trying for the most part to avoid shape morphing for mouth animation (which to some extent cuts against synfig’s strength, I realize). I’m looking to replicate the more traditional approach of drawing each phoneme - mainly because, for me at least, it’s much faster than mucking about with tweening vertices. I also prefer the way it looks, but maybe that’s my old age showing. :smiley:

I’ve done flash animation and I’m used to thinking in terms of movie clips and action script. There, you’d make a movie clip with one frame for each phoneme, then from, say, the main timeline you’d use actionscript to tell the mouth movie clip which phoneme to display at any given instant. I don’t suppose there’s any analog to that approach in synfig, is there?

Thanks again,


You can do your recipe or mine with step interpolation. It would show same results.