Analyze the Data not the Drivel.

SmugMug Duplicate Image Hunting

February 5, 2010 · Leave a Comment

One of the many things the developers of Thumbsplus got right was a proper normalized database schema.  When I first inspected the layout of a Thumbsplus database I knew I was in good hands.  In Thumbsplus image files get unique keys and image galleries are simply lists of image keys.  Images can appear in any number of galleries, without duplication,  just the way the gods of database design intended.

Assigning unique keys and grouping by key lists is so correct that it was a shock to discover that SmugMug,  until recently,  eschewed this principle.  Prior to a recent upgrade if you wanted to display an image in more than one gallery you had to … shudder with horror …. make copies!  Whenever I made an image copy I felt  like I was  masturbating in an art museum.

This outrage is now fixed and you can place an image in as many galleries as you want without copying.  Unfortunately there is a residual problem.  How do you hunt down and exterminate all your bogus copies?  In an acronym:  MD5.  SmugMug  assigns MD5′ s to all images.  If two MD5’s are the same there is any an extremely high likelihood you are dealing with copies.  So all you have to do is find images with identical MD5’s and delete the extra copies.  The following J verb uses image tables created from the XML captured by my SmugMug metadata dumper to do just this. 

SmugDupsFrMD5=:3 : 0

NB.*SmugDupsFrMD5 v-- duplicate SmugMug images from MD5.
NB.
NB. monad:  btct =. SmugDupsFrMD5 clDirectory
NB.
NB.   SmugDupsFrMD5 'c:\pd\docs\smugmug\data\'

NB. read table files
path=.tslash y
albums=. readtd2 path,SMUGALBUMTABLE
images=. readtd2 path,SMUGIMAGETABLE
images=. }. images [ imhead=. 0 { images

NB. all duplicate MD5's
pos=. imhead i. <'MD5'
md5=. pos {"1 images
dup=. md5 #~ -. ~:md5
images=. (md5 e. dup)#images
images=. (/: pos {"1 images) { images

NB. remove images with matching smugmug pids
NB. these are proper virtual images and not copies
pos=. imhead i. <'PID'
pid=. pos {"1 images
dup=. pid #~ -. ~:pid
if. #images=. (-.pid e. dup)#images do.

  NB. retain selected columns and insert album names
  images=. (imhead i. ;:'FILENAME GID PID MD5 ALBUMURL') {"1 images
  albums=. ((0 {"1 albums) i. 1 {"1 images){ 1 {"1 albums
  images=. albums (<a:;1)} images

  NB. group by MD5
  images=. (~:3 {"1 images) <;.1 images
  images=. >&.>@:(<"1@|:) &> images

  NB. order MD5 groups by galleries in groups
  NB. this results in a good order for editing
  NB. out the duplicates on SmugMug
  images=. (\:&.> 1 {"1 images) {&.> images
  (\: 0 {&> 1 {"1 images){images
else.
  NB. no duplicates
  0 5$''
end.
)

J source is not supported by the WordPress source code plugin so no syntax coloring for now.

→ Leave a CommentCategories: Photography · Programming · Software
Tagged: , , ,

Command Line C# SmugMug API Metadata Download

February 3, 2010 · 1 Comment

I have a skeleton in my photographic closet!  I enjoy hacking pictures as much as I enjoy shooting them.  Before digital photography I got my jollies the old fashioned way with chemicals:  dark room chemicals.  I still get all emotional when I remember the scent of a fixer.   Ahhh — those were the days. 

Now,  instead of inhaling fumes in the dark, I hang out on picture sites:   SmugMug is my current favorite.   Over the last year I have uploaded thousands of carefully cataloged  images:  you can view them here.   I may not be much of photographer but when it comes to image metadata my anal analytic side shines.  I can EXIF, IPTC and GEOTAG with the best of them.   

Because I tweak metadata online, and I suffer from a retentive character flaw,  it’s only natural that I would seek to download my sacred metadata.  This is what SmugMug’s API is for!  When I started experimenting with the SmugMug API I made the mistake of reading the documentation.  SmugMug documentation is,  at best,  a “work in progress.”  It may help but probably not!  I found trolling the web looking for code examples more productive.  

To help the next SmugMug API geek I am posting a fragment of a simple command line C# metadata dump utility I put together.   The core of the program  is shown below and all the C# source is available here.  This program is to trivial to license so help yourself.

namespace SmugMugMDDumper
{
    class Program
    {
        private const string xmlHeader = @"<?xml version=""1.0"" encoding=""UTF-8""?>";

        // defaults - insert your own SmugMug apikey, password, email here
        // defaults are used if corresponding command line arguments are missing
        private const string apiKey = "<YOUR SMUGMUG APIKEY>";
        private const string passWord = "<YOUR SMUGMUG PASSWORD";              
        private const string emailAddress = "<YOUR SMUGMUG EMAIL>";        
        private const string outFile = @"c:\temp\smugmugdata.xml";        

        static void Main(string[] args)
        {
            try
            {
                DataSet ds = new DataSet();
                XmlDocument doc = new XmlDocument();
                Arguments comline = new Arguments(args);
                SmugmugMetaData smugmd = new SmugmugMetaData();

                // parse and set any command line arguments
                if (comline["help"] != null)
                {
                    string __helpMsg = @"
Typical command line calls:

  SmugMugMDDumper.exe -apikey:""xQDzWwLp2I1GUGli88g999VrQWN4Xz56"" -email:""youremail"" -password:""nimcompoop"" -output:""c:\test\smugdata.xml""
  SmugMugMDDumper.exe -output:""d:\mystuff\smuggy.xml""
  SmugMugMDDumper.exe -password:""newpassword"" -output:""c:\temp\out.xml""
  SmugMugMDDumper.exe -help

";
                    Console.Write(__helpMsg);
                    return;
                }

                string __apiKey;
                if (comline["apikey"] != null) __apiKey = comline["apikey"];
                else __apiKey = apiKey;

                string __emailAddress;
                if (comline["email"] != null) __emailAddress = comline["email"];
                else __emailAddress = emailAddress;

                string __passWord;
                if (comline["password"] != null) __passWord = comline["password"];
                else __passWord = passWord;

                string __outputFile;
                if (comline["output"] != null) __outputFile = comline["output"];
                else __outputFile = outFile;

                // start output file
                smugmd.WriteToFile(xmlHeader + "<SmugMugData>", __outputFile);

                // open SmugMusg session - uses https
                string __sessionID = smugmd.StartSMSession(__apiKey, __emailAddress, __passWord);

                // collect all galleries
                ds = smugmd.GetGalleries(__sessionID, __apiKey, __outputFile);
                DataTable myTable = ds.Tables[0];
                DataRow myRow;

                // image metadata for each gallery
                smugmd.AppendToFile("<GalleryImages>", __outputFile);
                int rowcnt = myTable.Rows.Count;
                string rowstr = "/" + rowcnt.ToString() + "]: ";
                for (int i = 0; i < rowcnt; i++)
                {
                    myRow = myTable.Rows[i];
                    Console.WriteLine("gallery [" + (i + 1).ToString() + rowstr + (string)myRow["Title"]);
                    doc = smugmd.GetGalleryImages(__sessionID, __apiKey, (int)myRow["id"], __outputFile);
                }
                smugmd.AppendToFile("</GalleryImages>", __outputFile);

                // complete output file - end SmugMug session
                smugmd.AppendToFile("</SmugMugData>", __outputFile);
                smugmd.EndSMSession(__sessionID, __apiKey);

                Console.WriteLine("[Complete] output file: " + __outputFile);
            }
            catch (Exception ex)
            {
                Console.WriteLine("[Fail] SmugMug Metadata Dumper Failure - error message: " + ex.Message);
            }
        }
    }
}

→ 1 CommentCategories: Photography · Software
Tagged: , , , , , ,

Why Code when you can Steal

January 28, 2010 · 1 Comment

I am learning C#.

Two years ago I swore a blood oath not to learn anymore programming languages.   It’s been obvious for decades that you seldom find any new and important ideas in programming languages.  What you typically find are old ideas renamed and wrapped in a new syntax.  Virtually all key concepts in programming are over twenty years old and many are far older.  My disgust with new languages started with a single word:  refactoring! 

When I met refactoring it seduced me with its sleek geeky’ness.  What could this wonderful word mean and what thrilling concept did it clothe? Well it basically means cleaning up your abhorrent code so that you can make some freaking sense of it!  All competent programmers, dating back to Ada Lovelace (1815-1852),  have been refactoring all their goddamn coding lives.  Refactoring is geek marketing: the same old shit in a glistening new package.

C# is as free of new concepts as I expected but the language has its strengths.  C# has managed to inherit most of its predecessors gifts without introducing untested features.  C#’s designers restrained themselves and it shows.  The language is clean, easy to learn, and integrates elegantly with  .Net libraries. 

This is all good but what makes it better is that you can steal tons of C# code.  Google and Bing are my accomplices.  When I want to find out what a DataSet does I just pop a query and dredge up nuggets like:  Creating A Data Set From Scratch in C#.   In the old days you had to read  dense language documents like the J Dictionary and think for yourself.

Thinking for yourself is so 20th century;  why code when you can steal!

→ 1 CommentCategories: Software
Tagged: ,

Let’s Hang Congress

December 7, 2009 · Leave a Comment

As Mark Twain once rudely noted:

“It could probably be shown by facts and figures that there is no distinctly native American criminal class except Congress.”

Regardless of your political or sexual orientation it’s hard to disagree.  The current Congress is the largest collection of brain-dead fuckwits since the previous Congress and no doubt the next Congress will be even worse.  What’s a bereaved citizen to do?

How about the simple obvious solution: let’s hang everyone in Congress!  It’s simple, direct and effective.  Left wingers will be delighted because a lot of right wingers will hang.  Right wingers will be delighted because even more left wingers will hang.  Independents will be thrilled because left and right wingers will hang. 

Now I know what you are thinking,  “If we hang Congress won’t it be more difficult to find people to run for office?”  I have to agree; hanging doesn’t seem like much of an incentive.  We have to strike a balance between hanging and the grotesque perks of Congress.

Currently there are 535 members of Congress.  Let’s get a big jar of jelly beans and number the beans from 0 to 534.

0, 1, 2, 3, ..., 534

Then every year, instead of listening to boring State of the Union speeches, we hold a public drawing.  Some lucky blindfolded child,  (because we are doing it for the children),  will reach in the jar and grab a bean.  If the bean is numbered 0 we hang everyone in Congress.  If another number comes up it’s back to the trough for another year. 

The typical member of Congress serves about ten years:  thank you Google.  What are the chances the average member will swing?  This is easy to compute. Generate 10 random integers between 0 and 534 and see if 0 pops up.  In the J programming language:

   ? 10 $ 535

195 467 514 498 79 345 306 344 450 530

This lucky Congress critter survived.  If we run this experiment ten million times we get:

  10000000 %~ +/ 0 e."0 1 ? 10000000 10 $ 535

0.0185448

or a 2% chance of hanging.  Considering the prerogatives, pork,  perks, and pensions of Congress a measly 2% chance of hanging seems about right.  Most of the time it’s business as usual and every now and them the public enjoys the spectacle of terminal term limits.

→ Leave a CommentCategories: Rants

2012 Business Opportunity

November 11, 2009 · Leave a Comment

Are you an unscrupulous deranged crank with the track record of a government budget forecaster?  Do you have no shame and assume everyone around you is an inbred moron?  Can you say one thing one day and the opposite the next and see no contradiction?  Do you consider “evidence” to be shit that you make up whenever you feel like separating fools from their money?  If you answered yes to any of these questions then you might have a bright future as a 2012 entrepreneur.

But you have to act fast.  End of the World deals only come around every few years.  Seize the moment or you may have to wait months before the next opportunity to scam credulous dolts.  My advice: strike while the iron is hot. After 2012 the only people making any money on 2012 will be sour skeptics telling us I told you so and we all know how much that pays.

Since we have no qualms our only problem is marketing. We have to find the right 2012 product.  The field is getting crowded and we must exercise caution. 2012 entrepreneurs welcome competition like drugged out bike gangs welcome competition.  With this in mind we have to rule out:

  1. Movies:  2012 opens this weekend and if the critics can be believed we now have a viable substitute for water boarding and Itsy, bitsy, teenie, weenie, yellow polka dot bikini looping endlessly.
  2. TV specials: The History Channel has locked up this market segment with one mind draining special after another.  If I even see the number 2012 on the History Channel again I will climb up the nearest Mayan pyramid and cut my own heart out! 
  3. Survival gear: What’s a girl to wear when the world ends.  Get your end of the world gear now before the panic, or intelligence, sets in.
  4. T-Shirts: Every disaster needs one.
  5. End of the World Party Catering: Annihilation on an empty stomach is so 2011.
  6. Mayan Resort Holidays: This one actually makes some sense. The prospect of heaping scorn on scores of new and old age nitwits on December 22, 2012 would warm my cold skeptical core.

Let’s face it breaking into the 2012 market is hard.  It takes an imaginative opportunistic whore to come up with a market worthy 2012 scam.  I’m out of ideas;  maybe you can do better.

→ Leave a CommentCategories: Rants
Tagged: , , , ,

Hard Ass Skeptic Rules

October 29, 2009 · Leave a Comment

I am a hard ass skeptic.  A hard ass skeptic is somebody that holds:

  1. Ghosts do not exist
  2. Gods do not exist
  3. UFOs are not alien spaceships
  4. There are no Yetis
  5. There are no psychics
  6. 9/11 was not an inside job
  7. Life after death is nonsense
  8. Reincarnation is more nonsense
  9. Somebody called Christ did not rise from the grave
  10. Mohammed did not ascend to heaven
  11. The world will not end in 2012
  12. Heaven and Hell do not exist
  13. You cannot build a Good and Evil meter

Hard ass skeptics are often accused of being sour, bitter, judgmental know-it-alls by the usual crew of credulous buffoons.  Cry me a river people!  No matter how many idiotic ghost hunting shows you put on the history channel you are never going to convince a hard ass skeptic that ghosts exist. Just so we can all get along I am going to disclose how you can win an argument with a hard ass skeptic. 

The key to marketing ideas to hard ass skeptics is to understand what hard ass skeptics consider valid arguments.  Hard ass skeptics will consider only two types of arguments:

  1. Mathematical proof
  2. Hard science

If you cannot couch your argument in these terms just shut the fuck up.  You are wasting your breath and the hard ass skeptic’s time.  

I know this is harsh but it gets worse.  Mathematicians and scientists make mistakes: lots and lots of mistakes.  How does the hard ass skeptic deal with this inconvenient truth? The typical hard ass skeptic simply doesn’t have enough time, skill or expertise to filter out all the rubbish from the rubies.  So the hard ass skeptic assumes it’s all crap until there is verified overwhelming evidence to the contrary.  

Finally, hard ass skeptics put the burden of proof on a supposition’s supporters. It’s not the hard ass skeptic’s job to prove you are wrong.  It’s your job to prove,  beyond any reasonable mathematical or scientific doubt, you are correct.   So please stop whining about how we haven’t proved that  psychics, Yetis or Heaven do not exist.  It’s not our job and it never will be.

→ Leave a CommentCategories: Rants
Tagged: , , , , ,

Counting Gods

October 26, 2009 · Leave a Comment

How many fingers do you need to count gods?  If you are like most people you simply assume the set of gods is enumerable and then you start fighting over which value in the set of positive integers \mathbb{Z}_{\geq0}  or  {0, 1, 2, 3, \ldots, } is correct.  This process is elegantly explained by this flowchart.

I have a simple question. Is the set of gods enumerable?  For all you innumerates out there a set is enumerable if-and-only-if its members can be counted.   It’s a rather astonishing fact that many humdrum sets cannot be counted.  Do you remember line segments  \overline{AB} from school geometry?

The number of points in any nonzero length Euclidean line segment  \overline{AB} cannot be counted.  There are more points in \overline{AB} than there are counting numbers.  If you don’t believe me  check out Cantor’s diagonal argument and savor a classical mathematical treat.

Once you get your head around the idea that some things cannot be counted the notion of counting gods looks ludicrous.  Why do you expect to tally your supreme beings with ordinary counting numbers when they cannot deal with line segments?  Perhaps god counting requires more than positive integers!

Perhaps the bloody disputes between atheists GodCount=0 and deists GodCount \geq 0 can be resolved with rational numbers \mathbb{Q}.  How about fractional 1/2  gods.  Let’s just split the difference and all get along.  Unfortunately rational numbers will not satisfy ultrapolytheists, believers in more gods than you can count, because you can count rational numbers.  Ultrapolytheists need to get real \mathbb{R} – the numbers  – anyway.

Counting gods is a complex problem.   Electrical engineers cannot deal with the wiring in your fridge without complex numbers  \mathbb{C}.  Maybe a god count of 3 + 7i is the ticket

If this sounds silly, condescending and sarcastic congratulations your bullshit detector is working.  However,  ridiculous as this little diatribe is,  it is as sound as any mainstream religious doctrine.  So if you find this absurd and, you have a fixed god count in your little head,  who is really being silly?

→ Leave a CommentCategories: Rants
Tagged: , ,

F# Sirens are Seducing

October 21, 2009 · Leave a Comment

For the last few days I have been playing around with an early release of F#. F# looks like it might be a .Net language that I can stand!  The alpha-geeks at Microsoft Research have convinced their corporate masters that the world needs a functional programming language that can be compiled and executed as efficiently as traditional imperative and object oriented languages.

As a long time user of interpreted functional programming languages I can only say: it’s about freaking time!

F# appeals to me for a number of reasons:

  1. I am familiar with many of the concepts of functional programming.  Coming from the APL world I have been using many of these ideas for decades.
  2. F# can  use .Net facilities.  It can call, and be called, by any of the  .Net languages.  Lack of deep supported integration with host facilities has always hurt languages like J.  You pretty much have to roll your own for everything.
  3. F# can be compiled to virtual machine code just like C#.  APL,  J,  K and others do not compile well.  Heroic attempts have been made to compile APL and similar languages but sometimes heroes fail.
  4. F# supports asynchronous and parallel programming. The typical computer these days has two, four, eight or more processors.  In the very near future your grandmother’s PC will have eight or more 64 bit processors and perhaps hundreds of gigabytes of real, (not virtual), memory.  Programming such machines with conventional languages and tools will be,  as one blogger noted,   insanely complicated.  F# may provide a means to focus the power of such machines without sweating blood.

I have  ordered Programming F# and now I actually have something to look forward to in Visual Studio 2010.

→ Leave a CommentCategories: Software
Tagged: , ,

Nobel Peace Prize Idol

October 9, 2009 · 1 Comment

My doubts about the Nobel Peace Prize started when Henry Kissinger and Le Duc Tho won during the Vietnam war.  In my naive youth I thought a peace prize might have something to do with peace!  I was so cute and unsophisticated in those days.  As I aged I developed a robust sense of irony but when Yasser Arafat won the peace prize I was again dumbfounded.  How could an outright unrepentant  murderer qualify for a peace prize?  Clearly my sense of the absurd needed work.  I labored long and hard and when Jimmy Carter won for not being George Bush I took it in stride.   Then Al Gore won for saving the Earth from global warming.  Again my cynical detachment required tuning.  What did global warming have to do with peace?  Did I miss all the brutal carbon dioxide wars?  Finally I understood.  The Nobel Peace Prize has nothing to do with Peace! Of course I still foolishly harbored some illusions that you had to do stuff. This morning, upon learning that Barak Obama had won, I divested my final illusions about the peace prize.   The Nobel Peace Prize is now worth less than a Canadian Idol win.  At least some Canadian Idols can sing and dance.

→ 1 CommentCategories: Rants
Tagged: , ,

Google Earth Image Touring

October 4, 2009 · Leave a Comment

In a scientific poll of one, (I sampled myself),  Google Earth was voted the greatest free program on Earth. The brilliant developers at Google have managed to turn the most unlikely subject, geography,  into a video game.  And what a game it is!  Google Earth doesn’t troll around in an adolescent make believe world.  Google Earth serves up the real thing.   This would be amazing enough but with Google Earth you can explore the Moon, Mars and the night sky as well.  If you have any geographic  or astronomical needs get Google Earth.

I got into Google Earth a few years ago when I started geotagging old pictures.  Locating old, pre-GPS pictures, is one of the better Google Earth games.  My mother snapped this Instamatic picture of me in 1964.  I am standing in front of the Uintah mountains in north eastern Utah.

1964 was long before GPS was even a dream.  To find exactly where I was standing I turned on Google Earth’s terrain feature and slowly followed the adjacent road until all the landscape features matched up.

For famous sites it’s not this hard.  Here I am at Stonehenge.  Stonehenge is a Google Earth nobrainer.

 

Over the last few years I have geotagged hundreds of images.  It’s something I will continue to do until I drop dead. Online picture sites like SmugMug provide automatic mapping of geotagged images but Google Earth does a much better job if you do a little KML programming.  KML is an XML namespace used by Google Earth and Google Maps.  It can be used to tour geotagged images.  When KML files are compressed,  (any ZIP utility will do),  they are given a KMZ file extension.  KMZ files can be loaded by Google Earth.

The attached KMZ file geotagged.kmz was generated by a J script I put together to query my geotagged SmugMug pictures.  When loaded into Google Earth it spins around the world pausing to display all the pictures I have geotagged.  I realize few will be interested in my pictures but it’s not hard to extract the KML embedded in geotagged.kmz and adapt it to your pictures.  Go forth and hack my friends.

→ Leave a CommentCategories: Photography · Software
Tagged: , ,