Why DVCS should change your workflow

June 12th, 2013

I’ve used mercurial for a while – simply for the ease in building repositories and setting up servers.  It’s a wonderful thing to not need to worry about extracting sources and creating patch files anymore.  Other than building temporary working repositories, I’ve used DVCS almost identically to other central solutions.  Indeed, one of the major gripes I hear when people adopt mercurial or git is that “it works just like svn, only there are extra steps”.  It’s the extra step that adds value.

We’ve all been there before – someone makes a commit, landing on the build server with a dull thud, killing the ‘days-since-last-broken’ timer.  Taking a look at the commit, everyone immediately asks – “Did you even bother to compile it?”  And then, commence whatever form of hazing the team uses to punish such poor behavior.

Obviously broken commits come down to a handful of faults: sloppy “small changes, missing files, and bad merges.  There’s not much we can do with sloppy work, but the last 2 hit even the best software developers hard.

With subversion and friends, a good programmer will exercise a standard workflow of:

CODE -> UPDATE -> COMMIT

Generally the “update” part is where things go badly.  Either the programmer will not bother to update and test, checking only for compilation, or the “update” will end up potentially leading down a rabbit hole of work, especially if multiple people are working on the same code.

Many teams adopt the same idea when moving to mercurial or git, focusing on the idea that you don’t want to see large numbers of ‘branches’ in the end code.  I understand the validity of that point of view, especially for purposes of code review / auditing history.  Even then, however, the DVCS work flow SHOULD change.

Let’s look at the DVCS ‘tweak’ of the old workflow, for those that care update minimalist history:

CODE -> COMMIT -> PULL -> REBASE ->PUSH

Both mercurial and git offer rebase options. I believe mercurial outshines git here though – the mq extension provides an amazing amount of flexibility to walk up and down commit chains, with built in ‘work flows’ to make sure you don’t get out of sync with the external repository.  Importantly, however, not that we are doing our work, saving a snapshot, and then pulling and updating.

For those that don’t care about the minimal history:

CODE -> COMMIT -> PULL -> MERGE -> PUSH

Here, the original ‘working’ sample that the developer tested and worked against is forever frozen in history.  This comes to the core “plus” side of using DVCS -> we have a reproducible state of the repository where the developer has performed their work.

Now, this depends on us loosing the reigns a little bit from our centralized work flows.  The critical element is realizing that in either git or mercurial, you are working through full repository snapshots, as opposed to changes.

With the MERGE model, we are showing the world the state of our local machine as exactly as possible, when we ran through and tested our code. Further, we can see the interaction between our changes and another, even if that other didn’t result in merge conflicts.

It’s very subtle, but the difference between ‘snapshot’ management and ‘change’ management is huge.  Leveraging that difference will keep he “oops bad merge” and “ops I missed a file” build breakages to a minimum.

New Server….

May 28th, 2013

Should be moved over now, but we’ll see how this goes. :D

Here, There, std::shared_ptr Everywhere!

February 8th, 2013

In general, code bases using boost, or C++11, seem to gravitate toward shared_ptr “everywhere”, or developers avoid shared pointer altogether. When developers come from the Java/C# world, shared_ptr often stands in for garbage collection. Other develops will bash the performance hit of shared_ptr and rail against the use everywhere crowd. Neither of these is a good thing. Let’s tackle shared_ptr issues one at a time.

Performance

Multiple old school C/C++ developers see a shared_ptr and immediately think – extra clock cycles!  Compared to a stack based variable, this is true – dynamic memory allocation is an order of magnitude slower than simply reserving space on the stack.  However, if the object will have a lifespan to justify being placed in dynamic memory, things get more interesting.

shared_ptr costs in 2 ways -  shared pointer operations(construct, copy, move) and additional overhead of pointer dereference.

When dereferencing, shared_ptr will add an extra dereference  cost.   With disabled optimization/debug, it’s possible an extra function call will be added as well.  However, there are no memory or locking barriers to be concerned with.

shared_ptr penalizes performance during assignment and destruction.   Assignment and modification will generally result in an interlocked increment/decrement operation.  These are substantially less expensive than heavy weight locks, but still result in a memory barrier.   Still, with careful stewardship, these operations can be minimized and limited to areas with significant preexisting costs. (The overhead of a shared_ptr destructor is substantially less than the delete operation it might invoke.)

TL;DR std::shared_ptr causes a small performance hit on dereference, and is more expensive to copy or move.

Ownership

The most common mistake in shared_ptr use is not realizing that pointer ownership remains important.  For example, let’s say we have “car” and “garage” objects.  The “car” object has a concept of “location”, and the “garage” object has a concept of “contents”.  A shared_ptr here would result directly in memory leakage.

We can resolve the circular link by defining a clear ownership rule.  In this case, perhaps the “location” owns the car – we transfer the “car” object to using a std::weak_ptr and break the ownership chain.

However, now that we have a clear ownership chain, we can ask another pertinent question – does the existence of a shared_ptr make sense at all? If the “garage” owns the car, using a naked pointer (or reference), might be a better solution.

Exception Safety

Wrapping elements in a shared_ptr does help improvement exception safety.  However, there are multiple other wrappers that will accomplish the same goal (std::unique_ptr), not to mention the safest route of simply using the stack for variable storage.

Conclusion

When does a shared_ptr make sense?  For me, this has come down to a single, simple question – “Does the object need multiple owners”?   A followup question – “Does this object have any life cycle needs that make multiple owners sensible”?  Especially when dealing with asynchronous IO and callbacks, ownership of an object and life cycle guarantees often resolve to multiple ownership.

In cases with dynamic and shared ownership, the shared_ptr is difficult to beat – well tested, known thread safety, and nicely optimized.

Back at TextUx.com

February 6th, 2013

Sadly, I let this domain lapse many years ago, only to see it taken by a squatter wanting several thousand. I’d written it off as lost. After a fairly concentrated attempt to purge from the internet, the domain is now back. So, welcome back to “textux”. Kick your feetup and relax.

And F*#*# squatters.

Reposting Tutorials

April 1st, 2012

Many years ago, I wrote a series of tutorials for SDL and SCons.  Sadly, I let my previous domain lapse due to a serious lack of cash while finishing up college.  Adding insult to injury, some vulture picked up the domain to resell me for a couple thousand dollars.  I’ve noted that several people have mirrored these since then.  In an effort to legitimize their posting, I’ll be attempting to pull them into wordpress here.  It’s a bit tedious as they were done in hand edited HTML before. Hopefully I’ll have the motivation to go through and fix them up after I get them back online.

For those that did repost them: thanks.  It’s nice to see something you worked on floating around, especially as the reposts I’ve seen kept the credits alive.  Going forward, all the tutorials will be explicitly CC Attribution licensed. :)

Yet Another Personal Blog?

April 1st, 2012

Ugh, I promised I wouldn’t do this to myself.

Still, this site isn’t intended to explore my day to day frustrations and politics. No, this is intended to detail my professional and technological developments. Hopefully in a way people find interesting enough to follow.

So, who is this person you’ve discovered online? In general, I won’t talk too much about my professional career, but if you’re curious about my CV, you’ll find that here.  While I care about privacy, the internet is an excellent means to do research and share ideas without paying thousands of dollars to a large university. (Been there, done that).  Hopefully others will find some use for the “code toys” I work on and publish here.

This blog also links fairly directly to my “real world” identity (though not casually so).  As a technology professional, being “googled” is a fact of life. With a rare name, I hope to at least ‘manage’ the content others first see about my professional life, and limit those things I do hold private.