Managing Multiple Players with Kinect in C#

C#
Kinect
Player
SDK
Skeleton
Tracking

I’m no C# expert – in fact, before making Moto I had never touched it before – but I had the task of making a Kinect project which handled multiple players.

The Kinect SDK gives me all the information I really need, but how do you track your players on screen? Who’s who, exactly. It all gets a bit confusing, but I’ll share with you how I accomplished it in Moto. This is by no means a definitive example, just how I accomplished it. Got a better solution? Feel free to make a comment and let me know. Always looking for ways to improve!

Two Players of Moto

What’s needed?

Let’s look at it this way – what do we need to know for the program to function? Well, we need to know who’s in the Kinect’s view at any one time, their position and their limb positions and, in the case of Moto, information about what instrument they’re playing.

The Kinect SDK provides the information you need about the player. It all hinges around one thing – skeleton.TrackingId. This is a unique number provided by the Kinect to track a single player. If someone goes out of shot (out of the frame, or another player steps in front of them) that skeleton is then lost and so is reassigned a new tracking ID. So it’s not flawless, but it’s the best we’re going to get.

Tracking a full skeleton is obviously hard work, so Kinect only allows two full skeletons to be tracked even though it can be aware of the position of up to six people. The SDK also always returns a Skeleton array with a length of 6 within a skeletonFrame so from that you need to only pick up the skeletons with a trackingState of Tracked.

When we’ve found the skeletons that are tracked, it’s a simple case of getting their limb positions. We iterate through the array of skeletons and go from there.

Building a catalogue

So now we’ve got all that information, how do we store it? Well in Moto it’s set up as a publically accessible Dictionary called activeSkeletons.

public static Dictionary<int, Player> activeSkeletons = new Dictionary<int, Player>();

Just to confuse things, I’ve got a separate class set up called Player. Inside there:

public class Player
{
    public Player()
    {
        skeleton = null;
        instrument = Moto.instrument.instrumentList.None;
        mode = PlayerMode.None;
        instrumentImage = null;
        instrumentOverlay = new Dictionary<int,Image>();
    }

    public Skeleton skeleton { get; set; }
    public instrument.instrumentList instrument { get; set; }
    public PlayerMode mode { get; set; }
    public Image instrumentImage {get; set; }
    public Dictionary<int, Image> instrumentOverlay { get; set; }
}

So there we have the skeleton information, as well as Moto-specific bits that we associate with the player like what instrument they’re playing, what mode it’s in (in the case of Wall of Sound and the Guitar, mostly), a reference to that instrument’s image on screen associated to them as well as the overlays that we’re showing on them.

We get new information on the states of our skeletons every skeletonFrameReady event fire. In there, we copy our data to our allSkeletons skeleton array. From there we loop through this array and do our business.

Gaining and Losing skeletons

It’s all well and good on our first frame. We’re going from zero skeletons to either zero, one or two. Either way we are only going to be adding skeletons and not have anything to get rid of from our running Dictionary called activeSkeletons.

As a quick little recap – on every frame ready event we are copying _all_ the skeleton data, tracked or not, to do some crunching with. From there, we loop through that array and keep a note of all the tracked skeletons by copying them over to activeSkeletons. It’s there we do program-specific code running to make sure we’re only processing skeletons on screen. We copy all the skeleton data into this dictionary for processing later so we know it hasn’t changed since we recorded their presence in the frame etc.

[...]
using (SkeletonFrame skeletonFrame = e.OpenSkeletonFrame())
{
    [...]
    skeletonFrame.CopySkeletonDataTo(MainWindow.allSkeletons);

    Skeleton aSkeleton;
    List<int> skeletonList = new List<int>();

    for (int i = 0; i < MainWindow.allSkeletons.Length; i++)
    {
        aSkeleton = MainWindow.allSkeletons[i];

        if (aSkeleton.TrackingState == SkeletonTrackingState.Tracked)
        {
            skeletonList.Add(aSkeleton.TrackingId);

            //A new skeleton?
            if (!MainWindow.activeSkeletons.ContainsKey(aSkeleton.TrackingId))
            {
                MainWindow.playerAdded(aSkeleton);
            }
        }
    }
}
[...]

On our first frame, activeSkeletons will be empty and so it will not contain any tracked skeleton IDs. So for however many players are visible, playerAdded() is ran. playerAdded()creates blank definitions for a new skeleton and, more importantly, copies the skeleton data into the newly made Player class.

But surely, if we did that all the time activeSkeletons would get huge? True. So further down we do some number crunching to see whether we’ve lost a skeleton. skeletonList is, well, a list of all the skeleton IDs we have processed this frame. After we’ve done adding new skeletons, we iterate through this list and knock off matches between it and the activeSkeletons.skeleton.trackingId’s that we have. When we’ve reached the end of skeletonList, any players left over in activeSkeletons are obviously then no longer with us, so we can do some removal functions.

[...]
if (skeletonList.Count < MainWindow.activeSkeletons.Count)
{
    List<int> activeList = new List<int>(MainWindow.activeSkeletons.Keys);
    //We've lost at least one skeleton
    //find which one(s) it/they are
    for (int i = 0; i < skeletonList.Count; i++)
    {
        if (activeList.Contains(skeletonList[i]))
        {
            activeList.Remove(skeletonList[i]);
        }
    }

    //Remove them
    for (int i = 0; i < activeList.Count; i++)
    {
        MainWindow.playerRemoved(activeList[i]);
    }
}
[...]

We create a list called activeList because then we can remove them from that, not affecting activeSkeletons, to make our lives easier. playerRemoved() simply clears the dictionaries of all references to skeletons with the supplied tracking ID.

After we’re done processing all that, we can get on with processing everything else we need for our project. For Moto, that’s listening for gestures, detecting hits on drums and whatnot. That’s for another time, though. Sorry.

Conclusion

So, now we’ve got a record of all the skeletons, we can crunch some data without worrying about whether what we’re doing is for a player who’s not there any more which would be a) pointless, and b) a delightful recipe for crashes.

To reference any player, we can do so through activeSkeletons. As it has an index of our skeleton’s tracking id, we can address a player by saying activeSkeletons[23].skeleton.Position.Z to get that players distance from the Kinect, for example. That, or loop through it and update the position of on-screen instruments. The possibilities are endless, really.