Charles Petzold: Windows 8 Touch Events Interactions

Windows 8 Touch Events Interactions

February 1, 2012
New York, N.Y.

During the week in September that I was attending the Microsoft Build conference and learning all about Windows 8, only one topic made me gloomy: This was the Windows 8 implemention of the Manipulation events. These events provide a high-level application programming interface to touch input, so obviously they are very important.

At first glance, these Windows 8 Manipulation events were somewhat enhanced versions of the Manipulation events in Windows Phone 7 (which themselves were stripped-down versions of the Manipulation events in the Windows Presentation Foundation) but these enhancements did not fix a nasty flaw in the Windows Phone implementation, and that didn't make me happy at all.

A Little History

I first became acquainted with the Manipulation events when they were implemented in the Windows Presentation Foundation 4.0. I wrote about them in not one, not two, but three articles for MSDN Magazine. I liked the WPF versions very much because they consolidated the activity of multiple fingers into matrix transforms, which seemed to me both conceptually proper and technically straightforward.

But when the Manipulation events were moved to Windows Phone 7, they were stripped down quite significantly, and in the process they lost these matrix transforms. At first glance it seemed as if the transforms could be recreated from the information provided by these events, but it turned out to be rather awkward, and surely not what was intended.

Here's the basic problem: Often you want a two-finger manipulation to perform scaling and/or rotation of a visual object. Mathematically (and programmatically), scaling and rotation are always relative to a center point. You need that center point to define transforms to perform the scaling and rotation, but that point is not directly available in the Windows Phone Manipulation events — and it's still not available in the Windows 8 Manipulation events. When I was working with Windows Phone, eventually I wrote a blog entry on how I figured out how to derive that center point, but in the process these Manipulation events acquired an aura of inhospitability to me. The Manipulation events in Windows Phone had flipped their own bozo bit, and I found it hard to treat them with any respect.

I am not the only Windows Phone programmer who instead glommed onto the Gesture interface to multi-touch, which isn't even part of the Windows Phone API. This interface became available in the Windows Phone Toolkit downloadable from CodePlex, and internally it was implemented using the Windows Phone XNA touch interface. I found these Gesture events to be much easier to use than the Manipulation events and I wrote about them with much enthusiasm for MSDN Magazine.

But I am by nature an optimist, and perhaps something will change before the beta release of Windows 8, or perhaps I'll come up with a good fix that I can easily plug into any Windows 8 program. Meanwhile, let's take this opportunity to explore the Windows 8 Manipulation events in the context of the other touch events. Even despite this flaw, these events are important for working with touch input, and touch is an increasingly important part of modern computing.

The Windows 8 Touch Events

A Windows 8 Metro-style application written in C# uses a programming interface called the Windows Runtime (WinRT). This API does not define mouse or stylus events. Instead, mouse and stylus input has been consolidated with touch input into composite events. You can work with mouse and touch input without knowing the origin of the input, or you can determine where the input came from if you want.

All the consolidated input events are defined by the UIElement class, and they can be roughly divided into three sets of interfaces:

The Pointer interface consists of low-level events named PointerPressed, PointerMoved, PointerReleased, PointerEntered, PointerExited, and others. These are very similar to mouse events, except that with touch input they allow tracking multiple fingers. Each finger touching the screen has a unique integer ID.

Higher-level events exist named Tapped, DoubleTapped, RightTapped, and Holding that are fairly self-explanatory. These are intended to relieve the application programmer of implementing timing and movement criteria to determine if a gesture qualifies as a tap or hold.

The high-level Manipulation events are intended to relieve the application programmer from the task of tracking multiple fingers. The events instead consolidate the movement of one or more fingers into translation, scaling, and rotation information. If you need to implement two-finger pinch or zoom operations, the Manipulation events help out a great deal.

Virtual On methods of all these events are defined by the Control class.

To get a grasp on the various touch interfaces in Windows and how they interact, I wrote a little program that attempts to display some (but not all) of the information available from these events. You can download the ManipulationTracker project and compile and run it on your own Windows 8 machine.

The opening screen looks like this (at 1/3 size; click for full image):

At the left are check-boxes corresponding to members of the ManipulationModes enumeration. These determine the type of manipulation events your element will receive. The ManipulationMode property of UIElement is by default ManipulationModes.None, and that means you'll get no manipulation information. At the bottom is a table that displays information generated by the ManipulationDelta event that I'll discuss shortly.

The Pointer Events

The ManipulationTracker program overrides the OnPointerPressed, OnPointerMoved, and OnPointerReleased events. It ignores the PointerEntered, PointerExited, PointerCanceled, PointerCaptureLost, and PointerWheelChanged events.

The PointerEventArgs that accompanies these event has a GetCurrentPoint method that returns a Point relative to a particular element, and a GetIntermediatePoints method I haven't played with yet. There's also a Pointer property that supplies PointerDeviceType information (Touch, Pen, or Mouse) and a PointerId of type uint. This latter property lets a program track multiple fingers, or fingers in combination with mouse and stylus input. There is a unique ID for each pointer currently touching the screen, be it human or rodent.

Almost always when using the Pointer events a program will define a Dictionary with a key of type uint corresponding to this ID. This Dictionary maintains information about each individual finger currently touching the screen. The Dictionary entry is added during the PointerPressed event, modified during the PointerMoved event, and removed during the PointerReleased event.

The ManipulationTracker program displays a blue circle during OnPointerPressed at the point where the finger first touches the screen. During OnPointerMoved the program displays a red circle at the current finger position. In OnPointerReleased the two circles are removed. (Windows 8 itself also displays some visual objects to indicate screen input.) The program captures the pointer by calling CapturePointer during OnPointerPressed so it can continue tracking the pointer outside the element, and it makes a call to ReleasePointerCapture during OnPointerReleased.

The circles are semi-transparent so you can still see them when they overlap:

Note the gray halos around the red circles: Those are provided by Windows 8.

In one sense, your program doesn't need to go beyond the Pointer events. They provide all the information your program needs about mouse, stylus, and touch input, and you can use them to derive taps, holds, and pinches on your own. But it's certainly more convenient to use the other events defined by UIElement.

The Tap/Hold Events

By default, the IsTapEnabled, IsDoubleTapEnabled, IsRightTapEnabled, and IsHoldingEnabled properties defined by UIElement are all set to true. These correspond to the Tapped, DoubleTapped, RightTapped, and Holding events. (No, the RightTapped event isn't generated by a finger on your right hand: It's mostly used to register right-button clicks on the mouse, but you'll see shortly how it can be generated by touch.)

The TappedEventArgs, DoubleTappedEventArgs, RightTappedEventArgs, and HoldingEventArgs all have GetPosition methods and PointerDeviceType properties.

On receipt of a Tapped, DoubleTapped, or RightTapped event, the ManipulationTracker program flashes the name of the event at the right side of the screen with a line pointing to the location where the event occurred. (It's an animation that fades away after half a second.) As you might expect, a DoubleTapped event is always preceded by a Tapped event:

Notice the "Tapped!" text already fading away.

With the Tapped, DoubleTapped, and RightTapped events, the finger is always lifted from the screen (or the mouse button released) when the event occurs. The Holding event is a little different from the others, because this event occurs when the user is still touching the screen. The finger must remain in contact with the screen without moving (or not moving much) for a particular period of time.

(These tap and hold events and the Manipulation events are generated by the GestureRecognizer class and related classes defined in the Windows.UI.Input namespace. I suspect that somewhere in one of these classes, time and movement thresholds are defined to generate these events, but I have not been able to find public definitions or determine if they can be changed on an application basis.)

The HoldingEventArgs class defines an additional property named HoldingState that's an enumeration with three members:

Started
Completed
Cancelled

(Although most of the classes and enumerations I'm discussing here are defined in the Windows 8 Windows.UI.Xaml.Input namespace, HoldingState is defined in the Windows.UI.Input namespace, which duplicates some classes defined in Windows.UI.Xaml.Input, which means that referencing classes from both namespaces is rather awkward.)

The ManipulationTracker program displays the word "Holding" for Started, fades it out for Completed, but fades it out in red for Cancelled. You can experiment with the program to see how these work.

Touch your finger to the screen and hold it there. After a certain period of time, Windows displays a gray rectangle and a Holding event is generated with HoldingState.Started. Now lift your finger from the screen. That's another Holding event with HoldingState.Completed. In addition, a RightTapped event is generated. That's how you generate right-taps with your fingers.

Now try it again. Touch the screen and hold. A Holding event with HoldingState.Started is generated. Now with your finger still touching the screen, move it a bit. By my reading of the documentation, I think we should see a Holding event with HoldingState.Cancelled, but instead I get HoldingState.Completed. Regardless, no RightTapped event occurs in this case. If the ManipulationMode property is set to enable Manipulation events, these events will then commence.

Holding gestures are generally used in a couple different ways. Sometimes the hold evokes a popup context menu when the finger is released. In this case, you can probably ignore the Holding event entirely and just handle the RightTapped event. Or, you might want your program to go into a special mode, but for those cases you might want to focus instead on using the Manipulation events.

The Manipulation Events

The Manipulation events are intended to consolidate the activity of one or more fingers into the common graphical operations of translation, isotropic scaling, and rotation. One-finger rotation around a pivot point is supported. Inertia is supported. "Rails" are supported where an initial movement indicates whether the user is moving a finger horizontally or vertically, and then future input is restricted to that axis.

How to actually use the Manipulation events and their myriad features are topics that I'll cover in future blog entries and/or articles. For now, I just want to point out the major forms of information you can obtain, and what's missing from the mix.

To be sure, the Windows 8 Manipulation events have some enhancements over the Windows Phone versions that make them closer to the WPF versions. Windows Phone has three events: ManipulationStarted, ManipulationDelta, and ManipulationCompleted. Windows 8 adds two more: ManipulationStarting and ManipulationInertiaStarting. The ManipulationStarting event is in WPF, which also has a ManipulationInertiaStarted (not "Starting") event. Windows 8 does not have the WPF ManipulationBoundaryFeedback event, and generally used for bouncing effects when an inertia-driven object hits the side of a container.

When only the mouse or one finger is involved, it's crucial to understand that a manipulation operation begins only if the mouse button has been pressed or the finger has been on the screen an appreciable period of time (so it doesn't count as a tap) and has moved (so it doesn't count as a hold). Thus, the manipulation operation has a built-in delay from the time the user first presses the mouse button or touches the screen.

When the user touches the screen with two fingers, however, then it's obviously not a tap and it's obviously not a hold, so the manipulation operation can begin right away without further delay.

But you won't get any of these events unless the ManipulationMode property of the element is set to something other than ManipulationMode.None.

If that is the case, then a ManipulationStarting (not "Started") event is generated after every PointerPressed event. The purpose of ManipulationStarting is mostly to allow the application to set two properties defined by ManipulationStartingEventArgs: Pivot (which involves certain properties for a single-finger rotation) and ManipulationContainer, which is the element used for a reference point for future finger-location information. It is my experience that setting ManipulationContainer currently raises an exception, so it can't be set. Consequently, all finger positions are relative to the application's window.

The event that really signals the commencement of a manipulation operation is ManipulationStarted (not "Starting"). When ManipulationStarted occurs, at least one finger is on the screen (or the mouse button is pressed). If it's only one finger, then it's been there for a little while and has moved a bit. During the manipulation operation, the element receives ManipulationDelta events. During the manipulation, other fingers can be pressed to the screen, and then released.

When all fingers have been removed from the screen (and the mouse button is released) one of two things can happen: If no inertia has been requested, a ManipulationCompleted event signals the end of the manipulation operation. Or, the program can get more ManipulationDelta events but now preceded by ManipulationInertiaStarting events. A ManipulationCompleted event then indicates when the inertia has "run out."

Here's the sequence:

ManipulationStarted event
A series of ManipulationDelta events
If requested, a series of ManipulationInertiaStarting and ManipulationDelta events in pairs
ManipulationCompleted event

The ManipulationStarted, ManipulationDelta, and ManipulationCompleted events all have event arguments that include a get-only property named ManipulationOrigin of type Point. This point should be relative to the ManipulationContainer property set in the ManipulationStartingEventArgs but since that property cannot be set in the current Windows 8 pre-release, the points are relative to the application's window.

Regardless how many fingers are touching the screen, there is only one current ManipulationOrigin. If one finger is touching the screen, it's the location of that finger. If multiple fingers are touching the screen, it's the average of those fingers.

Depending on what ManipulationModes members are set, and how many fingers are touching the screen, and what these fingers are doing, the ManipulationDelta event delivers composite translation, scaling, and rotation information. ManipulationDeltaEventArgs defines properties named DeltaManipulation and CumulativeManipulation both of which are ManipulationDelta objects — yes, ManipulationDelta is a class as well as an event — that has three properties:

Translation of type Point (but which is actually a vector)
Scale of type double (where 1 means no scaling)
Rotation of type double

(ManipulationCompletedEventArgs has a TotalManipulation property also of type ManipulationDelta.)

The ManipulationTracker program displays these properties in the table at the bottom of the screen. In addition, ManipulationTracker displays a static yellow square at the ManipulationOrigin point from the ManipulationStarted event, and a moving cyan square at the ManipulationOrigin point in the ManipulationDelta events. This moving cyan square is scaled and rotated based on the Scale and Rotation properties of the CumulativeManipulation, and a line is drawn from the center of the square back to an origin based on the Translation property.

Touch one finger to the display and move it.

Notice that the PointerMoved position and the ManipulationDelta position (at the lower-right) coincide, but the PointerPressed and the ManipulationStarted positions (at the upper-left) do not. That's because the finger had to move from its original position (indicated by PointerPressed) for the manipulation to begin. The Translation vector is obviously calculated from the original position at PointerPressed and not the position of the finger at ManipulationStarted.

Now put two fingers down on the screen simultaneously (or nearly so), and move them:

As far as PointerPressed events go, two fingers cannot hit the screen at the same time. One has to be first, and the other has to be second. But that second finger indicates that we're no longer dealing with a possible tap or hold. The manipulation can begin right away. In the ManipulationStarted event, the ManipulationOrigin point is the average of those two finger positions, and that's now the origin of the Translation vector.

Now put one finger down, start moving it, and then drop another finger to the screen and continue moving both:

Here the initial ManipulationOrigin is slightly offset from the first finger, as was the case with the single finger. The ManipulationOrigin during the later ManipulationDelta events is the average of the two fingers, so the Translation vector seems to be unanchored, but that's OK because it's just a vector and how no real position in space.

The Problem

I've written this program to scale and rotate that moving cyan square to illustrate the cumulative Scale and Rotation factors, but the only reason this seems to work reasonably well is because that square is created in XAML like this:

The program transfers the Translation, Scale, and Rotation factors from the CumulativeManipulation property to that CompositeTransform. The CenterX and CenterY properties of that CompositionTransform are left at their default values of 0, which means that scaling and rotation are relative to the rectangle's origin. But notice how the rectangle is defined: The origin of that rectangle — the point (0, 0) — is the center of the rectangle, so the rectangle is scaled and rotated relative to its own center.

In the general case, that's not what you want. If you are using these Manipulation events to implement a standard pinch-zoom interface for scaling photographs, you want a scaling and rotation center that's based on the user's actual finger movements, and that is not directly available in these Manipulation events. These events have the same flaw as the Manipulation events implemented in Windows Phone, and you'll need a similar fix as I described in my blog entry "Manipulation Events Update for the WP7 Beta".

This and other topics I'll discuss in future installments.