SyntaxHighlighter

Thursday, 18 March 2010

Physics on a Separate Thread

After doing some profiling we found there are 2 main areas that use the CPU

  • Graphics (Ogre + D3D + Drivers)
  • Physics – mostly the calculation of constraints

They use roughly 30% of the CPU each when the FPS is limited to 60fps – otherwise the Graphics just gobble up what is left of the CPU if the frame-rate is unlimited.

This is pretty much what we expected - physics is the key part of the game / simulator.
As there are so many dual core (and above) processors out there (82% of steam users and increasing ) we really need to support more multi-core processors.

I think that the accepted approach to multi-thread code is to break everything down into smaller “tasks” and have some kind of task manager to pick each one up and execute it on a thread acquired from the thread-pool

As physics is such a big chunk we thought that a quick win would be to put physics on a separate thread. I will try to describe the method I took to achieve this, as I've said above, this is my first threading adventure, so your mileage my vary.

Overview

This is the general method I used for running physics in parallel.

  • Cache the physics variables for use with the non-physics code
  • Queue the changes to the physics variables when called by non-physics code
  • Use a Future (don't worry this will be explained later) to run physics updates in parallel with the non-physics code
  • Synchronise the physics variables when the physics updates are finished executing for this frame


Breaking it apart!

Our first goal is to logically separate the data to be used on the physics thread from the data to be used on the main (mostly-graphics) thread.

In our game we calculate the forces that would be generated on the car / tyres / etc and these are resolved using constraints , this eventually leads to positions / orientations / velocities and accelerations for each rigid-body in our game – mostly cars and tyres.

To logically separate these physics we first need to make sure that none of the non-physics code access these variables directly. If one thread accesses a variable while another is trying to write to it or both try to write to the variable, bad things happen, usually resulting in a crash.


Getters
The approach I took was to cache all the variables that are needed by the rest of the code (this is a surprisingly small number of variables in my case). This caching function is called in between physics iterations by the main (graphics) thread

void Vehicle::cachePhysicsData(){
_cache.pos =_chassisFront->pRigidBody()->GetPosition();
_cache.ori = _chassisFront->pRigidBody()->GetOrientation();
_cache.vel =_chassisFront->pRigidBody()->GetCOGVelocity();



for(int i=0; i < 4; ++i){
_cache.tyres[i].ori = _wheels[i]->pRigidBody()->GetOrientation();
_cache.tyres[i].pos = _wheels[i]->pRigidBody()->GetPosition();
...
}
}

I have removed most of the variables for brevity, but the same concept applies.
For all the “getters” of the vehicle I return the cached version. For example,
Vector3D & Vehicle::getPosition()
{
return _cache.pos;
}

This means that the graphics / logic thread can merrily work away while the physics thread calculates the next set of physics variables


Setters
So that is all the “getters” working, what about the “setters”, what if we want to change the physics params (e.g. change the tyre pressure or the steering / brake / throttle position). We cannot do this directly either as it would mean writing to a variable in the physics area while the physics thread is potentially reading from it.
The approach I took is to queue up these “setter” calls and then process them in between physics iterations. Here is an example showing of how I set the brake. SetBrake is called from the main thread and the setter's data is queued with a type (VPE_SET_BRAKE).

void Vehicle::setBrake( re brakeZeroToOne )
{
EventFloatMessage *ev = new EventFloatMessage(VPE_SET_BRAKE, brakeZeroToOne);
_cachedEvents.push_back(ev);
}

When we have finished the current physics iteration(s) we call the following function to call the “real” setBrakeInt function that actually modifies the brake constraint.
void Vehicle::updateSettingsForNextPhysicsIteration(){
vector<EventMessagePtr>::iterator ev = _cachedEvents.begin();
for (; ev != _cachedEvents.end(); ++ev)
{
switch ((*ev)->eventId)
{
case VPE_SET_BRAKE:
{
EventFloatMessage *fl = (EventFloatListMessage*)(*ev);
setBrakeInt(fl->dataList[0]);
}
break;


}
}


_cachedEvents.clear();
}

(*note* I have edited the code to remove a few complexities (mostly Boost::shared_ptrs) so it should be treated as pseudo-code)



Putting it in the main loop

With our cachePhysicsData and updateSettingsWhenForNextPhysicsIteration functions we have gathered all our Physics data interactions in one ...err...two places. This allows us to do these interactions quickly while our physics thread isn't writing to physics data.

Here is an outline of my main loop

_physThread = PhysicsThreadPtr(new PhysicsThread(_clock, _updateTimeMicroSecs, _physicsFunction));

while (!_done)
{
Poco::ActiveResult<SiroccoTime> physToken = _physThread->doPhysicsUpdates(_physicsT);

while (t < (currentTime = _clock->getTimeMicroseconds()))
{
runFixedHzTasks();

t += (_updateTimeMicroSecs);
}

runFrameRateHandlers(); // Includes rendering code
physToken.wait(); // if the physics stuff is still going we wait
_physicsT = physToken.data(); // physicsT is a time var so we know we have done the correct
// number of physics steps per sec
_physicsPrepFunction(); // Calls cachePhysicsData and updateSettings....

}



Active Methods are Poco's implementation of Futures. . These allow the programmer to call a method and it will put it on a thread automatically from a thread-pool (by default). When the programmer wants the result of the call they ask for it by calling ActiveResult::wait then ActiveResult::data.


When we call wait this waits for the thread to finish executing, therefore it is safe to call cachePhysicsData and updateSettingsForNextPhysicsIteration

Here is my Physics thread implementation that uses the Active Method (doPhysicsUpdates)

class PhysicsThread{
public:
PhysicsThread(IClockPtr clock, long updateMicroSecs, boost::function< void (void) > callBackFunction)
:doPhysicsUpdates(this, &PhysicsThread::doPhysicsUpdatesImpl),
_clock(clock),
_updatePhysicsFunction(callBackFunction),
_updateTimeMicroSecs(updateMicroSecs)
{

}

Poco::ActiveMethod<__int64,__int64, PhysicsThread> doPhysicsUpdates;
protected:
__int64 doPhysicsUpdatesImpl(const __int64 &tIn){
__int64 t = tIn;
while (t < (_currentTime = _clock->getTimeMicroseconds()))
{
_updatePhysicsFunction();
t += (_updateTimeMicroSecs);
}

return t;
}

private:
__int64 _currentTime;
IClockPtr _clock;
boost::function< void (void) > _updatePhysicsFunction;
long _updateTimeMicroSecs;
};



In this method we do the amount of physics updates it takes for us to catch up with “real time” (hence the while statement).


Summary

To reiterate, this is the method we used: -

  • Cache the physics variables for use with the non-physics code
  • Queue the changes to the physics variables when called by non-physics code
  • Use a Future to run physics updates in parallel with the non-physics code
  • Synchronise the physics variables when the physics updates are finished executing for this frame

The results so far are quite good. The overall FPS didn't increase much (perhaps 5-10%) on my quad core 6600, but when the FPS is around 300 it doesn't matter so much. I think the results will be more interesting on a lower powered dual core machine, more testing and tweaking needed! I think there is more to come. The game feels a bit smoother, which makes a nice difference.

As i've stated before, this is my first real expedition into multi-core coding, so if I have been an idiot and there is an obviously better way of approaching the problem, please let me know!

*Update*

The results I obtained above were obviously a placebo as there was a problem with the timer (using QueryPerformanceTimer) using SetThreadAffinityMask to stick to the first processor. The results are now much more interesting. On my slow PC the difference in FPS (not the best measure - but a measure none-the-less) went from 130 to 180FPS. On my development machine it went from 320 to 370FPS. So IMO a significant improvement and well worth the effort.