A Complex SVN-to-Git Conversion, Part 3: Forging a History

A Complex SVN-to-Git Migration, Part 2: Interrogating Git Internals

This is part two of a three-part series where I dig into Subversion administration and Git internals as part of a complex Subversion-to-Git migration. Last time, we handled locally mirroring a Subversion repository, and then using git-svn to convert that repository into a Git equivalent. We concluded by observing that the repository we were converting was beyond git-svn‘s ability to automatically convert, and acknowledged that we ended with the repository in a bad state. This time, we will identify the exact things that went wrong and collect all the information that we need to fix it. This will require us to get fairly deep into the Git’s internals, so we’ll start with a look at Git itself and the computer science that informs its design.

Git’s High Concept

Git is so amazingly simple to use that APress, a single publisher, needs to have three different books on how to use it. It’s so simple that Atlassian and GitHub both felt a need to write their own online tutorials to try to clarify the main Git tutorial on the actual Git website. It’s so transparent that developers routinely tell me that the easiest way to learn Git is to start with its file formats and work up to the commands. And yet, when someone dares to say that Git is harder than other SCMs, they inevitably get yelled at, in what I can only assume is a combination of Stockholm syndrome and groupthink run amok by overdosing on five-hour energy buckets.
Benjamin Pollack, Unorthodocs

Git as an application is a collaborative version control system: it allows multiple users to create and edit a collection of files organized in directories, and it tracks and credits the history of each file. But that’s what it does. What it is, as a program, is is a content-addressable file storage system. The short description of systems like this is nonsense: it’s a file system, but instead of using a filename to access the contents of a file, you use the contents of the file itself. Why even bother accessing it at that point? For that matter, how did you get the contents to begin with? The second question boils down to “somebody else gave it to us”, and the answer to the first can vary. In the simplest case the system represents a collection of things, and can tell us whether our data is in that collection. In the more common case, the data we have is associated with some other piece of data and that’s the data we want. In this case, the data we have is a sort of filename, but that filename is based on the content of the data itself somehow.

If you poke at Git even casually, you’ll find yourself regularly confronted by incomprehensible 40-digit hex numbers that, if they’re ever named anything consistent at all, are called “hashes”. As a software artifact, Git assumes that any user that’s getting deep enough to have to care about hashes has already received the necessary training to know what they’re dealing with and why. Given the quote I opened this section with, you may suspect that I don’t think that this assumption is warranted.

So we’ll start by solving a completely unrelated toy problem, and in solving it build up the technological infrastructure that underlies everything Git does to manage its content.

Continue reading →

A Complex SVN-to-Git Migration, Part 1: Cooperating With Subversion

The Many Faces of Retrocomputing

Noodling With Nonograms

After a fun couple of months exploring the capabilities of the Color Computer, I’m going to turn to something else—I’m going to use a programmable computer to actually investigate a question, instead of having said computer be the subject of the investigation. In particular, I will be poking around at some properties of nonogram puzzles.

These puzzles have a variety of names, with “Picross” being, I think, the most famous. Regardless of its name, the rules are very consistent:

The puzzle is presented as an empty grid, and the solution forms an image by filling in cells as if they were pixels.
Each row and column is annotated with a list of the number, and lengths, of runs of filled cells in that row or column. By convention, an entirely empty row or column has no clue attached instead of being clued as “0”.

These clues are more constraining than they might first appear. Because runs must be contiguous, on a board of width 9, the clue “8” only admits two possibilities: either every cell but the first is filled, or every cell but the last is. Similarly, runs may not be reordered, so the clue “3 4” must mean one of these three possibilities:

       |XXX.XXXX.|
       |XXX..XXXX|
       |.XXX.XXXX|

I’m not interested here in writing a solver. This has already been done, and I’m more interested in the overall shape of the puzzle as a whole.

Ambiguities and Contradictions

It’s very easy to produce impossible puzzles. Here’s an impossible 3×3 puzzle:

      1
   +-+-+-+
   | | | |
   +-+-+-+
 3 | | | |
   +-+-+-+
   | | | |
   +-+-+-+

That’s pretty trivial, of course; even on casual inspection we can see that the row clues and the column clues specify different numbers of cells. Ensuring equality isn’t enough, though:

        2
   +-+-+-+
 1 | | | |
   +-+-+-+
   | | | |
   +-+-+-+
 1 | | | |
   +-+-+-+

Here, there’s no way to place a vertical run of 2 such that the correct rows show a horizontal run of 1. Contradictions aren’t our only problem, either. Here’s a puzzle that’s not contradictory at all, but it can be solved two different ways:

    1   1         1   1
   +-+-+-+       +-+-+-+
 1 |X| | |     1 | | |X|
   +-+-+-+       +-+-+-+
   | | | |       | | | |
   +-+-+-+       +-+-+-+
 1 | | |X|     1 |X| | |
   +-+-+-+       +-+-+-+

This isn’t really suitable for nonogram puzzles, where part of the reward for solving the puzzle is that a work of pixel art is revealed—we want our solution to be unique.

The Questions

I’ll be attacking the following questions in this article:

How many possible puzzles—solvable or not—can be posed?
How many survive the initial filter of “row clues and column clues must agree on the total number of cells”?
How many unique solutions exist?

All of these questions depend crucially on the dimensions of the grid, of course. I will be starting at 5×5, the smallest puzzle size that is interesting enough to pose, and will scale up along the way.

I’m not terribly interested in producing pristine, optimal implementations here; the goal here is not to get the answer in the fanciest or most efficient way, but simply to get answers at all. Optimization is for once I actually need it to proceed.

Continue reading →

Softsoniq: Cycle-counting Out Polyphonic Music

Last time, I managed the graphical work required to draw the image in the Portrait of Liberty program I’ve used a few times as a system demonstration. We’ll add the musical component today.

This is a tall order. Our previous incarnations relied on the tone generators in their hardware to provide 3-voice harmony. The CoCo and the Dragon, in comparison, have no tone generators at all and only a digital audio output port. If we want multipart harmony, we will need to write our own tone generators and sound synthesizers in software. Happily, our adventures last year offer us a very promising inspiration for a design: the Ensoniq 5503 used by the Apple IIgs. If we take its operation as inspiration, stripping it down and simplifying a bit, we get a plausible first draft of a design:

Each voice has three 16-bit values associated with it: a pattern, an index, and a frequency.
Each pattern is a pattern to a 32-byte wavetable specification with 8-bit signed samples.
For each sample to be output, add the frequency of each voice to its index, then use the top five bits of each voice’s index value to index into the pattern for the waveform’s current value.
Sum these values together, normalize them to the 6-bit unsigned PCM output the CoCo’s chip demands, and actually output it.
Alter the frequencies as needed over time to produce an actual song instead of just tones.

This looks pretty plausible, and it also has the very nice property that it doesn’t really care what the output rate is; the faster we make the loop, the more fine-grained our frequency codes can be, but because our codes are actually frequencies and not wavelengths like we had to count out on the Apple II, we won’t get major gaps in the frequencies we can produce until we start hitting far more distant extremes. Our only real limit will be the Nyquist-Shannon sampling theorem; we won’t be able to produce a frequency higher than half our final output rate, since such a waveform would be a square wave toggling every output.

Before we start diving directly into code and timing, though, there are two bits of unfinished business to take care of in terms of the surrounding environment.

Continue reading →

Flood Fills on the CoCo and the Dragon

Last time, we designed and implemented most of a line-drawing bitmap library. The overall goal was to match the system I’d used to port John Jainschigg’s Portrait of Liberty program into a more sophisticated Commodore 64 (and, later, Atari ST) incarnation. This week, we’ll continue in that direction by implementing a different flood fill algorithm than the ones used in the previous implementations, and then putting together a version of the display suitable for the Color Computer.

Choosing an Algorithm

There are quite a lot of algorithms out there for carrying out flood-fill operations. For my Commodore 64 port I used the “lozenge fill” approach:

allocate a worklist
put the initial seed (x, y) in the worklist
while worklist is not empty:
  consume the front entry of the worklist
  if that pixel is not set:
    set the pixel
    for each adjacent pixel (up, down, left, right) that is not set:
      put that pixel in the back of the worklist

This got the job done but it also required me to set aside a fairly large chunk of memory for use as the worklist, and it meant that my flood-fill implementation wound up being tied pretty closely to the rest of the application so that memory could be usefully shared. Now that we’ve got a CPU that can do local variables, though, we can also do recursive functions without trouble, but recursive versions of the lozenge fill are unpleasantly expensive and with only 16KB or so of usable memory in our program space, there’s a real risk of stack overflow. If recursion is available, the “span fill” approach works much better:

If the point (x, y) is filled, return to caller.
Draw as far left and right as you can without colliding with pixel of your color or going beyond the screen.
Mark the leftmost and rightmost points as x1 and x2
Loop from x1 to x2, recursively flood-filling the pixels above and below each point.

This can still blow out your stack—if you were filling in a narrow, snaking figure with lots of vertical runs, you’d get into a very deep stack depth—but it will serve us here and it’s generic enough that I can put it in the bitmap library with a straight face.

Continue reading →

Building a Bitmap Graphics Library for the Dragon

We’ve worked through the basics of changing the graphics mode and writing bitmap data, and we’ve taken a closer look at Bresenham’s Line Algorithm and simplified the implementation a bit from our previous attempts. Today we’ll extend that with some routines that will let us draw points and lines. Beyond that, we’ll look again at flood-fill operations, because the architecture of the 6809 will let us do a simpler implementation of it that demands less of the surrounding program and its memory layout than my old implementation for the Commodore 64 did.

We’ll be walking through quite a bit of code here, but we’ll get some nice pictures at the end.

General API Design

As part of our exploration of graphics modes in the first place, we needed to write routines that would clear the screen, and which would swap us into and out of the 256×192 monochrome graphics mode. We’ll just copy those wholesale as PMODE, TMODE, and PCLS.

The screen memory itself is linearly laid out; each row of pixels is 32 bytes long, representing 256 pixels, and the full screen display is 192 rows, adding up to exactly 6KB. These facts imply several others, and those have implications for our design:

Both the X and Y coordinates fit in a byte. That’s often not true, even in this era; rows on the Apple II’s high-res display were 280 pixels across, and the PC, Atari 800, and C64 all boasted 320.
The X coordinate fits exactly in a byte. If we use an 8-bit register to hold this, it can’t be out of range. We may need to be more careful about edge detection as a result.
The distance between the location of a pixel and the location of the pixel above or below it is a power of two. Not only does this mean we don’t need to rely on the 6809’s multiplier, combined with the fact that the X coordinate fits in a byte, we can imagine doing operations on coordinates as if they were a single, fused 16-bit value.

Continue reading →

A Closer Look at Bresenham’s Algorithm

Last time, we got some high-resolution graphics out of the Dragon. However, it was essentially only doing “blitting” operations—we were copying out neatly byte-aligned rectangles of data, as if they were sprites or background tiles. We don’t yet have a technique for properly drawing diagrams with points and lines and such. I have written libraries to do this in the past:—my old C64 bitmap library included line-drawing, and my DOS edition of the Smoking Clover effect implemented a specialized line-drawing routine as well. Both of these are implementations of an old line-drawing algorithm called Bresenham’s Algorithm, and I really ought to have a nice implementation of it for the 6809 as well.

This routine is old enough that there are many opportunities to improve its speed or otherwise optimize things it does. I don’t care to delve too deeply into that—at the end of the day, I have to place all my pixels individually in the worst case—but I would like to look a bit more closely into how, exactly, the algorithm works. It is, ultimately, a fairly simple technique, but, interestingly, it turns out to share a surprising amount in common with some of the other, non-graphics systems we’ve looked at recently. In particular, in revisiting this old code, I found myself reminded of the Ensoniq sound chip on the Apple IIgs and the way it played its sound clips.

Continue reading →

Text I/O and Graphics Modes on the Dragon/CoCo

So far I’ve been rather haphazard in exploring the Dragon and the Color Computer it’s based on—once I got a working toolchain, I started putting the CPU through its paces with a decently complex decompression algorithm and carefully-timed audio output. Despite doing a fair amount of direct writing to screen memory along the way, though, we never actually bothered handling the basics of text input and output—with what we’ve done so far, we can’t really even write Hello World!

We’ll use the console control routines in the firmware to produce a simple text display, and then hold it until the user presses a key:

I must admit, however, that the font provided by these systems does not spark joy. There isn’t really a way to define a custom character set like there is on the C64 or Atari 800, either. However, it does offer a monochrome 256×192 bitmap mode very like the Atari’s, and there’s nothing stopping you from writing to that screen as if it were a 32×24 display:

Below the fold we’ll cover basic character operations, outline the graphics modes available, and put the monochrome hi-res mode to good use. We won’t actually do any traditional bitmap graphics beyond the text, though; that’s for next week.

Before we get into the details, though: this is our final post covering the basics of the Color Computer and Dragon systems, and I’ve now collected an overall summary for getting off the ground as the first of my new System Guides. If you’re just joining us and want to follow along with projects of your own, this will get you emulating, building, and running as quickly as is reasonable.

Continue reading →

Bumbershoot Software

An Umbrella Site for Michael Martin's software experiments

A Complex SVN-to-Git Conversion, Part 3: Forging a History

A Complex SVN-to-Git Migration, Part 2: Interrogating Git Internals

Git’s High Concept

A Complex SVN-to-Git Migration, Part 1: Cooperating With Subversion

The Many Faces of Retrocomputing

Noodling With Nonograms

Ambiguities and Contradictions

The Questions

Softsoniq: Cycle-counting Out Polyphonic Music

Flood Fills on the CoCo and the Dragon

Choosing an Algorithm

Building a Bitmap Graphics Library for the Dragon

General API Design

A Closer Look at Bresenham’s Algorithm

Text I/O and Graphics Modes on the Dragon/CoCo