A Complex SVN-to-Git Conversion, Part 3: Forging a History

This is the final article in this three-part series on a complex migration of VICE, a software project so old it predates version control and has gone through two VCSes already, from Subversion to Git. Last time we’d interrogated our partially-converted repository, and broken down each branch and tag into a series of source trees and the commits that they came from. Our tasks today are fivefold:

  • Take the disjoint commits that are the pre-1.22.9 releases and arrange them into a new branch named legacy_releases.
  • Take the sequences of source trees that we deduced for all the non-release branches and turn them into proper branches in their own right. Attach each branch to trunk at the commit where they were copied out. Unlike the legacy release branch, these branches need to retain authorship, timestamp, and commit comments from the Subversion originals.
  • As part of the previous step, any branches that were copied from a subdirectory of trunk/ instead of trunk/ itself need to have parent directories fabricated so that they may slot more neatly into the overall structure, work properly with git diff and related commands, and so on.
  • Find the places where we need manual overrides to carry out the previous steps properly, and override as needed. We found one of these last time (release 2.4.2 was miscopied); we expect to find more along the way.
  • Clean up the remnants of the conversion; remove any extra material created by the Subversion import and replace branches with tags where appropriate.

The primary theme of this week’s work is editing and fabricating history. We will be creating commits and even new source trees out of source material that the Git repository already knows about but doesn’t think is organized in this way, and we’ll be creating it alongside records that say that all this work was done in the distant past. Nothing about what we are doing today is normal, nor is it part of any sane workflow—but when you’re doing something unusual, sanity is a hindrance.

The Git Book has a page that discusses creating Git objects like blobs, trees, and commits out of whole cloth but while I found it very useful, for what I needed to do here it was not the whole story. The Git Internals tutorial discusses creating material from whole cloth; here we are obliged to coexist with and re-use material that we already have. Usually this makes our lives easier; sometimes it makes it harder. We’ll see both as we go along.

Let’s begin our descent.

Continue reading

A Complex SVN-to-Git Migration, Part 2: Interrogating Git Internals

This is part two of a three-part series where I dig into Subversion administration and Git internals as part of a complex Subversion-to-Git migration. Last time, we handled locally mirroring a Subversion repository, and then using git-svn to convert that repository into a Git equivalent. We concluded by observing that the repository we were converting was beyond git-svn‘s ability to automatically convert, and acknowledged that we ended with the repository in a bad state. This time, we will identify the exact things that went wrong and collect all the information that we need to fix it. This will require us to get fairly deep into the Git’s internals, so we’ll start with a look at Git itself and the computer science that informs its design.

Git’s High Concept

Git is so amazingly simple to use that APress, a single publisher, needs to have three different books on how to use it. It’s so simple that Atlassian and GitHub both felt a need to write their own online tutorials to try to clarify the main Git tutorial on the actual Git website. It’s so transparent that developers routinely tell me that the easiest way to learn Git is to start with its file formats and work up to the commands. And yet, when someone dares to say that Git is harder than other SCMs, they inevitably get yelled at, in what I can only assume is a combination of Stockholm syndrome and groupthink run amok by overdosing on five-hour energy buckets.

Benjamin Pollack, Unorthodocs

Git as an application is a collaborative version control system: it allows multiple users to create and edit a collection of files organized in directories, and it tracks and credits the history of each file. But that’s what it does. What it is, as a program, is is a content-addressable file storage system. The short description of systems like this is nonsense: it’s a file system, but instead of using a filename to access the contents of a file, you use the contents of the file itself. Why even bother accessing it at that point? For that matter, how did you get the contents to begin with? The second question boils down to “somebody else gave it to us”, and the answer to the first can vary. In the simplest case the system represents a collection of things, and can tell us whether our data is in that collection. In the more common case, the data we have is associated with some other piece of data and that’s the data we want. In this case, the data we have is a sort of filename, but that filename is based on the content of the data itself somehow.

If you poke at Git even casually, you’ll find yourself regularly confronted by incomprehensible 40-digit hex numbers that, if they’re ever named anything consistent at all, are called “hashes”. As a software artifact, Git assumes that any user that’s getting deep enough to have to care about hashes has already received the necessary training to know what they’re dealing with and why. Given the quote I opened this section with, you may suspect that I don’t think that this assumption is warranted.

So we’ll start by solving a completely unrelated toy problem, and in solving it build up the technological infrastructure that underlies everything Git does to manage its content.

Continue reading

A Complex SVN-to-Git Migration, Part 1: Cooperating With Subversion

VICE is the premiere emulator for the Commodore 8-bit machines, and it has been for some time. The project is old enough to predate the modern Internet, and it’s still under active development. Late last year, I helped out with a project to see what it would take to migrate VICE and its history to Git.

Doing this properly is a taller order than it first appears. While many automated tools exist to bridge the gap between Subversion (VICE’s most recent solution) and popular version control systems that precede and follow it, these tools do not always neatly match up in their conventions and chaining them can lead to disaster. That could have happened to The Ur-Quan Masters project—which, while younger than VICE, did still need to migrate from CVS to SVN to Git over the years—but development was sporadic enough that we could meaningfully “freeze” history at a certain point, migrate only the linear “trunk” history, and manually annotate the rest.

VICE is not so fortunate. VICE’s development history branched widely, includes dozens if not hundreds of changes that are part of the repository but not part of the main development lines, and over the decades it has accumulated over a thousand release tags. We want to convert all of this, and we want to get it into a form where Git’s tooling can offer convenient and accurate access to all of it.

The git svn commands, formally part of Git, but often shipped in a separate package, provides a good chunk of what we want, here. However, its ideas of how subversion systems lay out branches and tags do not precisely match what VICE did all across its history, and as a result the converted branches do not work quite as we’d wish. This project, therefore, ultimately has three parts:

  • Convert the Subversion repository into a roughly-correct Git one, which we may analyze and edit into its final form later.
  • Extract the correct histories of the main repositories and its branches from the roughly-correct version. Identify which branches were actually merged back into the main line, and where.
  • For branches whose histories do not map to anything in the main trunk—and thus were either never merged or predate SVN development entirely—fabricate a proper history for them so that they may exist as ordinary Git branches.

This was a big project, and I’ll be breaking it up into three posts. This introductory post will talk about how to efficiently mirror and convert Subversion and Git repositories. Part 2 will dig into Git repository interrogation as I collect the information necessary to dig the VICE repository out of its decades of tech debt, and in Part 3 we will practice the subtle art of fabricating history.

Continue reading

The Many Faces of Retrocomputing

Ten years ago this week, I made my first post to the Bumbershoot Software blog. It started out as a place to publish the various notes I’d collected over the years as a hobbyist, and eventually it turned into a kind of travelogue of my hobby work. This isn’t really the place for a retrospective on the blog itself—I basically did that last Christmas—but one thing that I’ve been thinking about since my PICO-8 article is the way my approach to “retrocomputing” is really only one of a variety of approaches people take to the topic. I’m doing all this for fun, after all, but it’s fun aimed at something serious—I’m interested in seeing what the past can teach the present, and vice versa. Older techniques from an era of slower and more constrained machines should still have lessons in the modern world when we wish to make best use of power or memory resources, and our fifty-odd years of progress in computing as an engineering discipline ought to pay dividends in the older domains, too. Our endless grousing as a profession to the contrary, we have learned some things over the decades.

Continue reading

Noodling With Nonograms

After a fun couple of months exploring the capabilities of the Color Computer, I’m going to turn to something else—I’m going to use a programmable computer to actually investigate a question, instead of having said computer be the subject of the investigation. In particular, I will be poking around at some properties of nonogram puzzles.

These puzzles have a variety of names, with “Picross” being, I think, the most famous. Regardless of its name, the rules are very consistent:

  • The puzzle is presented as an empty grid, and the solution forms an image by filling in cells as if they were pixels.
  • Each row and column is annotated with a list of the number, and lengths, of runs of filled cells in that row or column. By convention, an entirely empty row or column has no clue attached instead of being clued as “0”.

These clues are more constraining than they might first appear. Because runs must be contiguous, on a board of width 9, the clue “8” only admits two possibilities: either every cell but the first is filled, or every cell but the last is. Similarly, runs may not be reordered, so the clue “3 4” must mean one of these three possibilities:

       |XXX.XXXX.|
|XXX..XXXX|
|.XXX.XXXX|

I’m not interested here in writing a solver. This has already been done, and I’m more interested in the overall shape of the puzzle as a whole.

Ambiguities and Contradictions

It’s very easy to produce impossible puzzles. Here’s an impossible 3×3 puzzle:

      1
+-+-+-+
| | | |
+-+-+-+
3 | | | |
+-+-+-+
| | | |
+-+-+-+

That’s pretty trivial, of course; even on casual inspection we can see that the row clues and the column clues specify different numbers of cells. Ensuring equality isn’t enough, though:

        2
+-+-+-+
1 | | | |
+-+-+-+
| | | |
+-+-+-+
1 | | | |
+-+-+-+

Here, there’s no way to place a vertical run of 2 such that the correct rows show a horizontal run of 1. Contradictions aren’t our only problem, either. Here’s a puzzle that’s not contradictory at all, but it can be solved two different ways:

    1   1         1   1
   +-+-+-+       +-+-+-+
 1 |X| | |     1 | | |X|
   +-+-+-+       +-+-+-+
   | | | |       | | | |
   +-+-+-+       +-+-+-+
 1 | | |X|     1 |X| | |
   +-+-+-+       +-+-+-+

This isn’t really suitable for nonogram puzzles, where part of the reward for solving the puzzle is that a work of pixel art is revealed—we want our solution to be unique.

The Questions

I’ll be attacking the following questions in this article:

  1. How many possible puzzles—solvable or not—can be posed?
  2. How many survive the initial filter of “row clues and column clues must agree on the total number of cells”?
  3. How many unique solutions exist?

All of these questions depend crucially on the dimensions of the grid, of course. I will be starting at 5×5, the smallest puzzle size that is interesting enough to pose, and will scale up along the way.

I’m not terribly interested in producing pristine, optimal implementations here; the goal here is not to get the answer in the fanciest or most efficient way, but simply to get answers at all. Optimization is for once I actually need it to proceed.

Continue reading

Softsoniq: Cycle-counting Out Polyphonic Music

Last time, I managed the graphical work required to draw the image in the Portrait of Liberty program I’ve used a few times as a system demonstration. We’ll add the musical component today.

This is a tall order. Our previous incarnations relied on the tone generators in their hardware to provide 3-voice harmony. The CoCo and the Dragon, in comparison, have no tone generators at all and only a digital audio output port. If we want multipart harmony, we will need to write our own tone generators and sound synthesizers in software. Happily, our adventures last year offer us a very promising inspiration for a design: the Ensoniq 5503 used by the Apple IIgs. If we take its operation as inspiration, stripping it down and simplifying a bit, we get a plausible first draft of a design:

  • Each voice has three 16-bit values associated with it: a pattern, an index, and a frequency.
  • Each pattern is a pattern to a 32-byte wavetable specification with 8-bit signed samples.
  • For each sample to be output, add the frequency of each voice to its index, then use the top five bits of each voice’s index value to index into the pattern for the waveform’s current value.
  • Sum these values together, normalize them to the 6-bit unsigned PCM output the CoCo’s chip demands, and actually output it.
  • Alter the frequencies as needed over time to produce an actual song instead of just tones.

This looks pretty plausible, and it also has the very nice property that it doesn’t really care what the output rate is; the faster we make the loop, the more fine-grained our frequency codes can be, but because our codes are actually frequencies and not wavelengths like we had to count out on the Apple II, we won’t get major gaps in the frequencies we can produce until we start hitting far more distant extremes. Our only real limit will be the Nyquist-Shannon sampling theorem; we won’t be able to produce a frequency higher than half our final output rate, since such a waveform would be a square wave toggling every output.

Before we start diving directly into code and timing, though, there are two bits of unfinished business to take care of in terms of the surrounding environment.

Continue reading

Flood Fills on the CoCo and the Dragon

Last time, we designed and implemented most of a line-drawing bitmap library. The overall goal was to match the system I’d used to port John Jainschigg’s Portrait of Liberty program into a more sophisticated Commodore 64 (and, later, Atari ST) incarnation. This week, we’ll continue in that direction by implementing a different flood fill algorithm than the ones used in the previous implementations, and then putting together a version of the display suitable for the Color Computer.

Choosing an Algorithm

There are quite a lot of algorithms out there for carrying out flood-fill operations. For my Commodore 64 port I used the “lozenge fill” approach:

allocate a worklist
put the initial seed (x, y) in the worklist
while worklist is not empty:
  consume the front entry of the worklist
  if that pixel is not set:
    set the pixel
    for each adjacent pixel (up, down, left, right) that is not set:
      put that pixel in the back of the worklist

This got the job done but it also required me to set aside a fairly large chunk of memory for use as the worklist, and it meant that my flood-fill implementation wound up being tied pretty closely to the rest of the application so that memory could be usefully shared. Now that we’ve got a CPU that can do local variables, though, we can also do recursive functions without trouble, but recursive versions of the lozenge fill are unpleasantly expensive and with only 16KB or so of usable memory in our program space, there’s a real risk of stack overflow. If recursion is available, the “span fill” approach works much better:

  1. If the point (x, y) is filled, return to caller.
  2. Draw as far left and right as you can without colliding with pixel of your color or going beyond the screen.
  3. Mark the leftmost and rightmost points as x1 and x2
  4. Loop from x1 to x2, recursively flood-filling the pixels above and below each point.

This can still blow out your stack—if you were filling in a narrow, snaking figure with lots of vertical runs, you’d get into a very deep stack depth—but it will serve us here and it’s generic enough that I can put it in the bitmap library with a straight face.

Continue reading

Building a Bitmap Graphics Library for the Dragon

We’ve worked through the basics of changing the graphics mode and writing bitmap data, and we’ve taken a closer look at Bresenham’s Line Algorithm and simplified the implementation a bit from our previous attempts. Today we’ll extend that with some routines that will let us draw points and lines. Beyond that, we’ll look again at flood-fill operations, because the architecture of the 6809 will let us do a simpler implementation of it that demands less of the surrounding program and its memory layout than my old implementation for the Commodore 64 did.

We’ll be walking through quite a bit of code here, but we’ll get some nice pictures at the end.

General API Design

As part of our exploration of graphics modes in the first place, we needed to write routines that would clear the screen, and which would swap us into and out of the 256×192 monochrome graphics mode. We’ll just copy those wholesale as PMODE, TMODE, and PCLS.

The screen memory itself is linearly laid out; each row of pixels is 32 bytes long, representing 256 pixels, and the full screen display is 192 rows, adding up to exactly 6KB. These facts imply several others, and those have implications for our design:

  • Both the X and Y coordinates fit in a byte. That’s often not true, even in this era; rows on the Apple II’s high-res display were 280 pixels across, and the PC, Atari 800, and C64 all boasted 320.
  • The X coordinate fits exactly in a byte. If we use an 8-bit register to hold this, it can’t be out of range. We may need to be more careful about edge detection as a result.
  • The distance between the location of a pixel and the location of the pixel above or below it is a power of two. Not only does this mean we don’t need to rely on the 6809’s multiplier, combined with the fact that the X coordinate fits in a byte, we can imagine doing operations on coordinates as if they were a single, fused 16-bit value.
Continue reading

A Closer Look at Bresenham’s Algorithm

Last time, we got some high-resolution graphics out of the Dragon. However, it was essentially only doing “blitting” operations—we were copying out neatly byte-aligned rectangles of data, as if they were sprites or background tiles. We don’t yet have a technique for properly drawing diagrams with points and lines and such. I have written libraries to do this in the past:—my old C64 bitmap library included line-drawing, and my DOS edition of the Smoking Clover effect implemented a specialized line-drawing routine as well. Both of these are implementations of an old line-drawing algorithm called Bresenham’s Algorithm, and I really ought to have a nice implementation of it for the 6809 as well.

This routine is old enough that there are many opportunities to improve its speed or otherwise optimize things it does. I don’t care to delve too deeply into that—at the end of the day, I have to place all my pixels individually in the worst case—but I would like to look a bit more closely into how, exactly, the algorithm works. It is, ultimately, a fairly simple technique, but, interestingly, it turns out to share a surprising amount in common with some of the other, non-graphics systems we’ve looked at recently. In particular, in revisiting this old code, I found myself reminded of the Ensoniq sound chip on the Apple IIgs and the way it played its sound clips.

Continue reading

Text I/O and Graphics Modes on the Dragon/CoCo

So far I’ve been rather haphazard in exploring the Dragon and the Color Computer it’s based on—once I got a working toolchain, I started putting the CPU through its paces with a decently complex decompression algorithm and carefully-timed audio output. Despite doing a fair amount of direct writing to screen memory along the way, though, we never actually bothered handling the basics of text input and output—with what we’ve done so far, we can’t really even write Hello World!

We’ll use the console control routines in the firmware to produce a simple text display, and then hold it until the user presses a key:

I must admit, however, that the font provided by these systems does not spark joy. There isn’t really a way to define a custom character set like there is on the C64 or Atari 800, either. However, it does offer a monochrome 256×192 bitmap mode very like the Atari’s, and there’s nothing stopping you from writing to that screen as if it were a 32×24 display:

Below the fold we’ll cover basic character operations, outline the graphics modes available, and put the monochrome hi-res mode to good use. We won’t actually do any traditional bitmap graphics beyond the text, though; that’s for next week.

Before we get into the details, though: this is our final post covering the basics of the Color Computer and Dragon systems, and I’ve now collected an overall summary for getting off the ground as the first of my new System Guides. If you’re just joining us and want to follow along with projects of your own, this will get you emulating, building, and running as quickly as is reasonable.

Continue reading