renwix

Company after company, I keep coming up against the same problem... The rate of development is slow. The bug rates are high. The deployment time via IT is rediculously slow. There are countless books on the subject, and many of them are read. So, what is going on?

And then there is the issue of code reuse... There seems to be two types in common practice:

Download software/library off the internet and use it

Copy and paste

Of course copy and paste has a terrible maintenance model. And 3rd party software usually has a ceiling... There has to be a better way...

I used to think that the results of asking these questions would be the backbone of a company started on a shoestring. The goal was a business where all Developement, QA, IT, infra could be easily managed by a team of 2 or less. Drive the cost of development and maintenance into the ground. Allow custom and changing requirements, and still deliver on time and under budget. If I could abstract and shrink the technical requirements into something manageable via a handful of web forms and reports, then I would be freed to focus on sales and custom work. After all, without the sales, there is no business regardless of how tight the infra is.

I approached most of the jobs I worked from this perspective. They were hiring me because I could cut costs to the point that they didn't need the team they had. Technology moves fast, and if the team couldn't innovate, the job got shipped overseas, the company got beat in the marketplace and closed its doors, or the whole team was replaced with a product that did it better/faster/cheaper. If we weren't writing ourselves out of our job every 2 years, we were doing something wrong.

It turns out that the kernel of the tool that solves this problem (or comes close) is more or less the same at every company. It also turns out that it has a lot in common with what marketers are looking for in closed loop marketing. Unfortunately, this next part might leave a bad taste in the mouth...

Closed loop marketing has been around for a while, but the internet bubble made it close enough that marketers clammored for it. With direct mailing and t.v. advertising, there is a lot of research to indicate that given enough advertisements, a person is inclinded to behave a certain way. So, throw enough money into mailings and television commercials, and you are bound to have sales go up. No one could ever measure how much "Where's the Beef" contributed to Wendy's sales in the 80's, but they knew that it was mainstream and in everyone's face. There are plenty of other examples.

But, with the internet came the ability to track users. Now you knew all kinds of information about them, without dragging them into a lab and studying them. Assign the user a unique id (which had always been done anyway) and use that to correlate their purchase information across web sites. Or in the case of banner ads, track what they view and click, with the assumption being that a click conveys a message of interest or intent. Or in email, put a bug in the page and now you know when they opened it, how many opened, and you could tie that data to purchase information.

Where the cycle used to involve product registration cards, usps, data entry clerks, mailers and cross your fingers that the purchaser references the mailing code when they purchase again, Now you knew in near real time that user X was engaging in some behavior. Wouldn't it be great to:

Customize their advertisement beased on past purchase history

Customize their view of the page/email based on history

Cross sell BMW to Banana Repub purchasers as part of their transaction

In other words, close the loop and close it quickly.

As the last decade or so has played out, this is actually happening, but in ways that we don't think about. The best example of this is Google Search. It is difficult to get your job (if you are a dev) done without Google Search. Back in the day, there were a ton, Altavista usually had interesting hits, but Yahoo became the defacto. Then Google blew them out of the water. As more information is collected on the internet search becomes more important. So think about this in terms of your browser behavior...

Take the example: "I want to see a movie tonight". Movies.com... need a map? mapquest, google maps.. Who wants to go with? twitter... After the movie, post my thoughts to twitter, myspace, facebook, blogger... Let people comment on my comments... Rotten Tomatoes? Maybe, but a google or a blog search will turn up people's opinions on the street. NYTimes? only if I trust the critic and he isn't having a bad day.

Now, let's do it the old way. See a comerical/get a newspaper, check the listings, read a review, call a friend, buy a ticket at the theater, and talk to friends after the show. The average Joe didn't have access to publish thoughts in a digital (free) way, so thoughts were shared over a cup of coffee.

People used to pick a critic that they could more or less agree with, since different people like different movies. Now you can just read the averages across a handful of internet searches. The scale is completely different. And rather than call a friend or two, I will just blast my twitter, or text my cell phone distribution list. We act like the marketers much more than we used to, but we have also closed the loop without realizing it. We blast our intentions to a group of friends rather than calling each one individually. By putting our movie review back onto the internet, we affect someone else's decision. We are also keeping a record of our intentions and our conclusions.

Now compare this process to a marketer. "I want to sell 10 bars of soap." Look in my database for people who have purchased from us in the past, were happy with their buy and would likely buy again. If I know that the database tends to have a 5% response rate, then send 200 people an advertisement with a 10% off coupon. Or buy ten 30 second spots during the morning news (when people are likely to be thinking about soap). Get a tracking code into the 10% off coupon, and encourage purchasers to give me their zipcode (or better yet phone number) when they buy at the store.

Abstracting it up a level (or two)... I have some source data that will help me meet some goal. Consult that source data to derive a list of actions to take. Do each one of the actions, and try to measure the results of those actions. Use the results of that measurement to better search the source data next time.

All of this is important in the context of being able to make your next decision better than the current decision. More data means more samples and (presumably) better averages. In the old movie case, the list of source data was friends, and critic's reviews. In the new movie case, the list is exponentially larger as I can find out what random strangers think as well. In the marketer's case, the source pool is only the people that I know about in my db (if I want accurate measurements) or if I join a co-op, then I have access to people who might cross-sell.

So, it turns out as a Dev, I want almost the same thing. I have a big chunk of input code choices, my own, open source, the company code... I need to make an informed guess as to which is the best path, implement, and (if I am smart) measure the affect of that decision. How many bugs, lines of code, devs, checkins. What is the general feel? Do I like using the code that I decided on, etc... Turns out that this abstracts fairly well into the same model, as do many other things.

This leads to a discussion on data, because, that alone can be extremely complicated :). The number of inputs in the code decision is large (to say the least). Sometimes my company helps trim the choice for me by locking me into a particular deploy platform, or picking the implementation language for me. It is actually helpful to have some walls in this case to speed up the decision making. But the thing about walls is that they are structure. Without them, you might have to create your own walls, just so that there is some foundation to start building on. Without some basis, some standard it is difficult to get started. The "wall" that gets created here, is semi-structured data. Or, it is imprecise data. We have general knowledge that there is a wall there, but we don't know all the characteristics of it.

In general I contrast this to non-structured data and highly structured data. The characteristics of the different types are that non-structured (isn't really non-structured, there is no such thing) is in some format that makes it difficult to do comparisons among related datasets. HTML on the internet is a good example. Each author might be writing about the same thing, but the nature of the HTML structure is the first barrier to entry in comparing those texts. A good example of highly structured data is anything in a RDBMS. The schema and constraints act to restrict (and enforce) the data associations. Compare the general characteristics, non-structured=quick and easy to produce, slow to query and analyze. Highly structured=slow to produce (usually), fast to query. There isn't really an arguement for one or the other, rather for either or, depending on timing.

There is a middle between these 2 points that is interesting because it allows us to optimize speed vs. structure. When the knowledge is partial, then relationships between knowledge is as important (or more) than the piece of knowledge. Translating that, define the schemas/enforcement on general knowledge relationships and leave the knowledge schemas to modular systems, or define them as hash structures. This maps nicely onto the "walls" that we have during the dev/maintenance cycle. Take UML as an example, we know that our code needs to be structured like so, but we can't see all the code yet. The structure of the code gets defined in the relationships.

Now things start to get interesting, because the implication is that as my knowledge changes, so does my code and structures. If my structures are external to the code in semi-structured data formats, it becomes much easier to query the data related to the code. I don't have to parse it (necessarily) to derive the in knowledge contained in it, but It isn't IntentionalProgramming either where the code itself is mapped to data. This can be used to answer the first part of closing the loop - how to query the code, and more importantly, what is important in the code to query.

Copy and paste is probably the simplest example for closing the loop. If I have a maintenance task where I need to update, and the code is setup in such a way that I have to c&p in more than one place, then I need to know that. While we all are looking for 100% code reuse, I find that 9 times out of 10, something that should get pushed up the stack and generalized, is not. The reasoning usually one or more of the following:

There isn't time for that now

Well, I am just doing it this way once. Or - this only happens c&p like this in one other place...

I don't know all the places in the code that are like this, so I can't generalize it up effectively

I don't know what it will break

So, now I have knowledge that needs to get abstracted up, but I am afraid for one reason or another. But, I know the structure of the code... A quick solution might be a pre-processor macro, or AOP if I know how to change it. Those are both shaky solutions, so we record the fact that it is in place (via our data mechanisms) and manage it.

If we take an approach like this, we are recording our decisions in data. We are also building a wall, because pre-processor/macro functionality requires a hook in MANY of the languages out there today. Making a decision like this means that we will be making more decisions like this in the future.

The last step is closing the loop. Using the decisions to speed up and tighten the overall code implementation. Hopefully this is done with an eye on bloat, but that is up to the implementor. "Closing the loop" is the topic of this blog. Giving myself the tools to produce more reliable code, faster. There is a lot in the details of this approach as can be expected.

renwix

Saturday, December 22, 2007

We can do better...

Friday, December 21, 2007

Software Dev Visualization

Thursday, December 20, 2007

OWL vs OOP

On Renwix

Knowledge Model

About Me

Blog Archive