[ Home | News | Contact | Links ]


NEC Multispeed Disk Upgrade

Parallel Port SD Card Interface

The parallel port was found to be the obvious solution for interfacing external storage to the computer. However,the hardware interface is only one part of the story. It would be highly desirable to interface the new hardware to the operating system, so files could be read and written using standard system calls. Otherwise, I would be limited to indirectly accessing files, using a specially-written program to "swap" them onto the floppy drives. Having never done any MS-DOS driver development before, this could be quite a challenge.

Some searching around resulted in a possible solution to this in the form of SDPP10 by Dan Marks. This is an MS-DOS device driver that would interface directly with a (micro)SD card in SPI mode via the parallel port (download). The only hardware needed is a level shifter between the 5V of the parallel port, and the 3.3V of the SD card. I therefore decided to try this approach.

To connect up the card, I ordered an SD card breakout board from ebay. Despite being advertised as having level shifting, no such components were evident when the board arrived. A second, different board got lost in the post, but eventually, after ordering directly from Sparkfun in the US, I had a suitable board with a micro SD card socket and a level shifter. This was wired up on top of a D25 backshell as per the instructions that came with SDPP10.

[Parallel port SD card adapter]

I followed the recommendations for partition size on the SD card, keeping the size below 32MB, I was still unsure if the version of MS-DOS on the laptop would recognise a disk bigger than a floppy, as the machine had never been designed for this, and I understood that the version of MS-DOS was at least partly customised to the machine. The only way of finding out was to test it and see if it worked.

Booting up with the driver enabled in CONFIG.SYS, I was presented with an error message saying that the card could not be detected. This appeared to be a low-level issue, rather than the filesystem not being accepted by MS-DOS, so I proceeded to try to debug the problem.

It wasn't really feasible to try to step through the device driver as it attempted to detect the card - at this point, my debugging options were limited to MS-DOS DEBUG, and I couldn't easily see a way of attaching it to the device driver at boot time. The SD.SYS driver comes with full source code, and I did try to recompile this on another machine, with the hope of being able to insert some debug logging. However, it seems that the build system was incompatible with both the version of Turbo C that I used, and also a copy of the Microsoft C compiler that I had access to.

Without any other real options, I connected up an oscilloscope to the serial clock and data lines, had a look at the card detect sequence as the machine booted up. This showed that MISO was stuck low, and so I added a 10k pullup. Data communication was then observed to be happening in both directions, however the card was still not detected successfully.

Given that I had access to the driver source code, it was possible to trace through its execution by following along with the data stream. in conjunction with the flowchart at https://elm-chan.org/docs/mmc/mmc_e.html . This let me deduce that the code was getting as far as send_cmd(CMD17, sector) at sdmm.c:662, but that the sector address was wrong, and the card was returning an error.

I wasn't quite sure why the driver was sending an incorrect sector address. I was rather suspicious of the DWORDLSHIFT() macro providing the wrong result, though the implementation seemed simple enough, and without a working build environment, there wasn't much I could do even if it was wrong.

After some head scratching, I decided to give the driver a go with a different SD card - I had been using an 8GB card up until now. I switched to a 2GB card - again, formatted with a 32MB partition - and to my surprise, the laptop booted up and the driver loaded successfully, with the card showing up as drive D: The upgrade was a success!

[MS-DOS prompt showing drive D]

I don't know what the actual problem was - given that I had a card that was working, fixing it was much less of a priority. There are a variety of different SD card types (as can be seen from the flowchart), and possibly the type that I was using originally had not been tested with the driver.

I copied various applications onto the SD card, including Borland Turbo C, Assembler & Debugger, MS Quick Basic, MS Visual Basic for DOS, and Lotus 123, and was able to get most of these running on the Multispeed. However, this showed that the read transfer speed from the SD card was pretty slow. Measurements showed it to be around 27kB/s - not really any faster that the floppy drives (though seek performance was obviously better), which was a bit of a disappointment.

I therefore decided to see if I could optimise the transfer speed at all. It was first necessary to see what the driver was actually doing in its inner read loop. Back on my Linux machine, the command 'objdump -Mintel,i8086 -b binary -D SD.SYS -m i8086 > sd.asm' gave me a disassembly of the driver. While this did not contain any symbol information, it was easy enough to locate the inner read loop by searching for the 'in' instructions while referring to the C source.

The inner loop looked reasonably tight, so I wasn't sure why the overall data transfer was so slow (The Multispeed is clocked at 9MHz versus the 4.77MHz of the original IBM PC, so I thought even a bit-banged implementation would have reasonable performance.) However, my unfamiliarity with the CPU and 8086 assembly tripped me up here - it seems that an 'out' instruction takes 12 cycles, and an indirect load is around 13 - quite an unpleasant surprise when you are used to programming something like the AVR, where everything is 1 or 2 cycles at most. (Note that the inner loop is unrolled) (See instruction set and timing )

For each bit, the inner loop does an indirect load for the parallel port data register address, two immediate loads and two outputs to pulse the clock line, an indirect load for the parallel port status register address, an input for the actual data bit, a bit test, conditional jump, and increment, and a shift. This adds up to quite a bit, so I had a look to see how it could be optimised.

Replacing the indirect loads with immediates would mean that the port address could not be configured at runtime (without self modifying code), and would be no faster anyway. But could the addresses be saved in separate registers, and loaded only once per byte, instead of for every bit? Unfortunately, the 8086 architecture requires the address for in/out to always be in the DX register. But swapping the registers with XCHG is still faster that reloading the register from memory, so I did that.

Also, testing the input bit, and conditionally incrementing the result, can be replaced by simply masking the bit, and ORing it into the result, saving an average of 11 cycles per bit (more for a 0 data bit, less for a 1). Of course, this means that the data bit is shifted into the middle of the output byte, rather than at the end (due to the data input being on bit 4), and so it must be rotated 4 places at the end of the byte, but this only costs 8 cycles, so there is still a net gain.

Old inner loop:

mov dx, WORD PTR ds:0x3e
mov al, 0x1
out dx, al
mov al, 0x3
out dx,al
shl cl,1
mov dx, WORD PTR ds:0x40
in al,dx
test al, 0x10
je +2
inc cl

New inner loop:

xchg dx,bx
mov al, 0x01
out dx, al
mov al, 0x03
out dx, al
rol cl, 1
xchg dx, bx
in al,dx
and al, 0x10
or cl, al

Given that I didn't have a toolchain set up for building the driver, it was necessary to make these modifications by patching the binary file. This can be done using DEBUG. While this program is nominally designed for interactive use, its terse syntax and lack of any "undo" facility means it is much easier to prepare a list of commands in a text file, and pipe them into the command.

With this change in place, the average disk read speed had increased to around 35kB/s, a small but worthwhile improvement.

While the original goal of upgrading the disk of the Multispeed has been achieved, there were still a number of disadvantages of the chosen solution. The speed was still considerably slower that would could be achieved using a conventional hard disk interface. Also, I did experience intermittent read/write errors and filesystem corruption with the SD card, which was clearly not desirable. Furthermore, the disk was often not detected when the machine was first powered on, and a reboot was required to find it. Finally, although the parallel port adapter was a reasonably neat solution, it was inconvenient that it required an external power supply. I therefore decided to investigate the use of alternative I/O interfaces.

On to Part 2 - Reverse Engineering.

Up to Introduction.


Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

loopgain.net