[ Home | News | Contact | Links ]


NEC Multispeed Disk Upgrade

Adapter Card Development

Conceptually, this would be fairly simple. Use the EA circuit to provide an 8-bit output port, which would be connected to a parallel in, serial out shift register. This would be connected via a level shifter to the SD card's MOSI pin. The MISO pin would go through a second level shifter into another shift register, the parallel output of which would connect to an 8-bit input port provided by the 8255. A serial clock would be generated, which would gated and fed to the clock inputs of the shift registers, and the SD card's clock pin. Additional timing logic would load the shift registers at the appropriate points in the cycle.

Further examination of this idea however showed that it was not ideal, as the data would effectively be "double-buffered" through both the 8255 and shift registers, resulting in additional latency, and requiring extra control logic. It was realised that a shift register with tristateable parallel outputs could be interfaced directly to the ISA bus with much the same control and timing circuitry, thereby making the 8255 redundant.

The first part of the circuitry to be designed was the address decoder. In the EA circuit, the A0 and A1 lines are decoded by the 8255, while the remainder of the address lines, and the AEN signal, are decoded by a pair of 74LS138 3 to 8 line decoders. While it would have been possible to implement both the 'read byte' and 'write byte' functions at a single address, I initially decided it would be simpler to use separate addresses, so the address decoder would need to provide multiple outputs. Fortunately, there were a couple of unused enable inputs on one of the '138s in the original design, which could be used to provide the two extra inputs required for the low order address lines. A bit of rearrangement of the address lines provided a block of 8 decoded consecutive address locations, that could be switched to one of 8 different base addresses starting from 0x300, still with only 2x 74HC138s.

The next question to address was that of the clock for the shift registers and SD card. While it is not absolutely essential that this is synchronised with the ISA bus clock, there is the potential for glitches when loading the shift registers if it is not. Furthermore, using the bus clock saves having to provide a separate oscillator.

It is obviously desirable for the serial clock to be reasonably fast, it order to maximise the data transfer rate. It was considered desirable for the hardware to operate at the maximum rate at which data could be read by the CPU using 'INP' instructions. Although the 8086 'INP' instruction is documented as taking 12 cycles, testing revealed that the Multispeed could execute this instruction in only 5 cycles, which is probably due to the increased performance of the V30 processor.

I therefore decided to run the serial clock at 2x the ISA bus clock, or 9.54MHz. While in theory this would give 10 cycles per instruction - enough to clock 8 bits - in practice some of these will be consumed while the /IOR line is low, and so the timing will still be a bit marginal for executing 'back-to-back' INP instructions. However, since the INP instruction only targets the A register, it would not be possible to fully unroll an INP loop without interleaving instructions to store the data elsewhere. Therefore, the 9.54MHz bit clock should not be a performance bottleneck.

I added a simple frequency doubler using an R-C differentiator and an XOR gate. While this is a slightly dubious method, it is considerably simpler than the alternatives such as a PLL. If it proved problematic, it could always be disabled by simply shorting out the capacitor.

The SPI bus is configured for CPOL=0 and CPHA=0, so the data should be sampled on the rising edge of the clock signal, and set up on the falling edge. The frequency doubled clock (CLK2) was fed through an inverter to provide a second phase (/CLK2) to use for clocking the shift registers.

It was then necessary to select shift registers for the design. There are various parts in the 74 series logic family with various features, however one of the most flexible of these is the 74HC299. This provides both serial-to-parallel and parallel-to-serial functionality, along with tristateable parallel outputs. I therefore decided to use this chip for the read function. I had also intended to use it for write as well, with the potential for both functions to eventually be combined in a single chip. However, the synchronous parallel load of the '299 proved to be too problematic, as it would have required clock pulses gated to the SPI write cycle, and also an extra pulse to perform the parallel load. I therefore switched to a 74HC165 for the write function.

The lowest of the (active low) decoded address lines was ORed with the /WR signal from the bus to provide a negative-going parallel load signal for the write/MOSI shift register. The inhibit input of the shift register is driven from /CYCLERUN, to prevent the register from clocking unless an SPI read/write cycle is in progress. The '165 is clocked from the /CLK2 alternate phase clock, so that is shifts 180 degrees away from the point at which the SD card will sample MOSI. The serial input is tied high to allow dummy 0xFF bytes to be transmitted during a read command (With SPI, it is not possible to receive data without transmitting at the same time).

The gated clock is provided by a 74HC163 4-bit synchronous binary counter. This is also clocked from /CLK2, so that its output is steady during the rising edge of CLK2, while the SD card is sampling the data on MOSI. Its QD output is inverted and fed to its enable inputs, so that it will count up to 8, then stop.

I found that the timings of the /IOW and /IOR pulses were a bit variable with respect to the bus clock, with /IOR in particular randomly jumping between the rising and falling edges of the bus clock. While the frequency doubler will in principle take care of this, I decided to add a flip-flop (IC10B) to resynchronise these edges to CLK2. This signal is combined with the QD output of the counter to give a "shift register cycle run" signal (CYCLERUN).

The read functionality is provided by a second shift register, with its serial input connected to MISO. This is simply clocked directly from the SCK line to the SD card. Its parallel outputs are asynchronously coupled to the CPU data bus by means of the decoded /IOR signal.

Because of the time taken to clock in the data from the SD card, and the fact that wait states are not allowed for I/O operations, it is not possible to return data in the same bus cycle as a read request. Instead, reading operates in a pipelined mode, with a read request returning the data read during the previous read (or write) command, then triggering a read of the next byte from the SD card, which is then buffered for the next request.

I decided to also implement a "peek" read function, to allow the previously read data byte to be retrieved, without triggering the read of another byte. This is done using a second output from the address decoder (base + 1), gated with /IOR. This simply enables the parallel outputs of the read shift register, without triggering another read cycle on the SPI bus.

[Timing diagram]
Timing diagram.

To allow the driver to control the SD "card select" line, a D flip-flop (IC10A) is clocked from the third decoded address line (/ADDR2), and samples the D0 data bit. Its Q-bar output drives /CS, so that this line defaults to high at reset.

Level shifting for the output (5V -> 3.3V) lines was provided by simple resistive dividers, with speedup capacitors. For the reverse direction, however, some gain is needed. Both the popular common-gate bidirectional level shifter circuit and a conventional common source inverting circuit proved to have insufficient bandwidth, and for a while I though I might have to revert to a single-source, SMT IC for this task. However, some experimentation yielded a design for a common-source inverter with an active load that gave rise and fall times of the order of 20ns, so this was adopted.

An activity LED was provided using a pulse stretcher driven from the /ADDR0_IORW signal. Another LED was provided to monitor the status of /CS. Finally, a linear regulator was provided for the 3.3V supply to the card.

[SSD card schematic]
The finished schematic. Note that this incorporates a couple of mods described in the text below.

Having finalised the schematic, it was then possible to lay out a PC board. I anticipated that it was fairly likely that some modifications to the finished board would be necessary, given that the circuit was completely untested. While most of the boards I have designed recently have been predominantly in SMT, I have always found this to be quite inconvenient where modifications are concerned, due to the mechanical fragility of the the components. I therefore decided to to use through-hole components where possible. This was also more in keeping with the construction of the Multispeed, which was predominantly (though not exclusively) through-hole.

The first step was to measure the expansion port slot in the computer, and the position of the header connector, to ensure that the board would fit mechanically, and that the connector would end up in the right position. Although there were a number of possible variables with the connector, fortunately it appeared that the slot was designed so that the expansion card would use a standard 40-pin right-angle header.

I also realised that it was likely that a substantial amount of debugging work could be necessary, it it would be desirable to be able to access the card in-circuit for measurements. I considered designing a separate extender card, but I was too cheap to pay the additional tooling cost for this (the board was already over the 100mm maximum size of the cheap rate at Seeed). Instead, I provided a second header socket footprint on the SSD board, wired in parallel with the original one. A second copy of the board could be populated with just the header sockets, and wired to a fully populated board using an IDE ribbon cable. As I would have to order a quantity of at least 5 PCBs, but only needed one finished board, this would allow reuse of a board that would otherwise be wasted.

To aid in debugging, I also made sure to identify as many of the critical nets on the silkscreen as possible.

[PCB design]

The boards arrived from Seeed without incident, and I then proceeded to check the mechanical fit of the board in the slot, initially with the connector just tacked in with one pin at each end. Although I had made the card a fairly loose fit in the slot, this proved to be something of a liability, as it allowed the card to twist in the slot, and made it hard to line up the connector (although the mechanical design of the slot housing was also partly responsible for this). I ground some chamfers on the leading edge of the board with a disc sander (something I should really have though of when I designed it). I also found that the header socket in the computer was fitted at a slight angle, so I adjusted the angle of the plug on the expansion board slightly to compensate. Having done this, it was then possible, albeit slightly fiddly, to insert the card into the slot.

I then proceeded to build the card up. I elected to do this in stages, first fitting just the header socket, then the passives, and finally progressively more of the semiconductors. After each stage, I checked for shorts on the power rail, and then inserted the card into the computer, to check that it would still boot.

[Assembled PCB]

With the card fully populated, the first thing to check was the CLK2 frequency doubled clock, as this could be done without any software. The clock signal appeared as expected, albeit with a bit of jitter, the duty cycle was fairly close to 50%, indicating that my initial guess of 1k and 150p for the differentiating network was reasonable.

I then wrote a simple program that would alternately write 0xFF and 0x00 to address 0x302, which should have caused LED to flash. However, this did not happen. Fearing that I had made a mistake in the address decoding logic, I started investigating. However, after extensive probing with the scope, I came to the conclusion that I had merely inserted the LED backwards!

However, although the LED was now lighting, it was doing so rather erratically. Even allowing for ISRs and DRAM refresh, this was worse than I expected, and slowing the program down with SLEEP statements (I was doing the initial testing using QBASIC, as this is somewhat more convenient than DEBUG) resulted in the LED not lighting at all. Examination of the signals feeding into IC10A showed that the rising edge of the clock pulse was considerably in front of that of the data.

I then realised that I had made a mistake - the flip-flop should be clocked on the rising edge of /IOW, and I had inserted an inverter (IC11C, not shown on the schematic) in this signal, due to the negative logic, which was unnecessary. Fortunately, it is a lot easier to remove a redundant inverter than to add one that isn't there. After installing the first of what I hoped wouldn't be too many mod wires, the LED worked reliably.

It was then possible to move on to examining the SPI waveforms. Starting with SCK and MOSI, these appeared to mostly conform to the timing as I had designed it, with the exception of a small glitch on SCK after the eighth bit was clocked. Though only about 10ns wide, this would be enough to cause problems, so I set out to eliminate this before attempting communication with the SD card.

The pulse was traced to the SCK enable circuitry around IC10B. The same rising edge of CLK2 both clocks IC10B, and is combined with the output of IC10B to form the SCK signal. Therefore, an edge that causes IC10B's Q output to go low will propagate to the SCK output until IC10B has had time to transition.

Ideally, the clock signal should not be propagated through chains of logic like this. Instead, the clock would be directly connected to the device in question (here, the SD card), and the gating signal would be set up during the previous clock period, and applied to an "enable" input. However, SD cards lack a suitable enable input - the chip select pin cannot be used, as it plays another role in the communication protocol, and cannot be deasserted after every byte.

Therefore, I decided to simply delay the CLK2 signal feeding into IC8B. This can be done quite simply by inserting a resistor inline with the signal. This forms an R-C time constant with the input capacitance of the gate. The value is of course a compromise between delaying the signal enough to remove the glitch, excessive skew of the SCK output, and maintaining enough signal level for reliable operation of the gate in the face of attenuation by the low-pass filtering action. In situations like this, if the value is found by trial an error, it is always advisable to "bracket" the value, to give allowance for variations in temperature, supply voltage, and component tolerances (if the circuit will be replicated). I found that the circuit would work from between at least 2.2k-10k, and so fitted a 3.3k resistor.

After doing this, the SCK waveform looked fairly good. There was some overshoot and ringing due to the fairly simplistic level shifting that I had provided, but this was without an SD card fitted. It would always be possible to fine-tune the speedup capacitors later on.

Next, I went on to test the read side of the SPI interface. There is not much extra in the way of waveforms to examine here, so I simply jumpered MISO to MOSI, and modified my program to read a value from the "peek" address after writing to the output address (which would also cause it to clock the written data back into the input shift register). This showed that the same data was being read back as was written, and reading back from the original address gave the expected value of 0xFF.

Having completed a basic verification of the hardware, it was now time to attempt communication with an SD card. I first modified my QBASIC program to perform the initial step of the initialisation sequence, which was successful. My idea was that, if I could fully initialise the card from QBASIC, I would know that the hardware was good, and could then move on to the task of porting the existing driver. However, reading through the card specification, I could see that it would be a considerable amount of work to implement from scratch.

I then thought it might be possible to translate the initialisation sequence from the existing C source into BASIC. I started working through this, porting the first few functions in the call sequence, but this did not operate correctly, and I could see that debugging it would be a major undertaking.

I thought some more about the original C source. Even if I couldn't compile it into the correct MS-DOS driver format, I could at least turn it into a standalone executable that would be able to initialise the card. While C development on the laptop was still not really a practical option at this stage, I could compile it on another system running Dosbox and Turbo C.

I therefore revisited my attempts to get the code to compile. Currently, running the Turbo C MAKE.EXE resulted in a number of errors. After getting the code under source control, I proceeded to work through these.

The makefile referenced BCC - presumably a later version of the compiler from Borland. I changed this to TCC, and also found that I had to change to use slashes instead of dashes for the option switches for TLINK, but other than this, the options appeared to line up.

The next error related to a "Rotate count out of range" in an assembly file, for the line 'SHR AX,4'. In the absence of in-depth knowledge of x86 assembly, I could only think of changing this to 'SHR AX,1' repeated four times, which appeared to work.

There were then some errors to fix in the C files, relating to inline assembly syntax, duplicate typedefs, some non-standard syntax in a switch statement, and function names for the inp/out instructions. With this done, the code compiled to the SD.SYS target, yielding a binary about 35 bytes bigger than the one supplied in the original archive. To my considerable surprise, when I transferred this file onto the Multispeed, it was fully functional with the original parallel port hardware! It looked like bringing up the new hardware was going to be easier than I thought...

I then set about converting the code to work with the new interface card. This was fairly straightforward, as the software SPI implementation could be replaced by simple in/out instructions, operating on data a byte at a time. The only real difference was that, in the original implementation, all of the output lines were updated simultaneously, while I had to add separate statements to deal with the /CS line in the new code.

One other complication was that the input command did not return the byte that had just been read from the SD card, instead returning the previous byte, while simultaneously reading another byte and placing in in a buffer. This meant that, when reading a block of data, a dummy read had to be performed first, to load the initial byte into the buffer, and that the last read in the block has to be performed as a special case, reading from the 'peek' address, so as not to read an unwanted extra byte from the SD card. This does entail some extra complexity, but as the read code is encapsulated in a function, it only has to be implemented once.

Having done this, I recompiled the driver, and loaded it onto the Multispeed. However, it did not function as it did with the parallel port adapter, instead crashing on startup. So, I had some more debugging to do.

This time, at least, I had access to a working toolchain, so I was able to recompile the driver with added debugging statements. This showed that, in the disk_initialise() function, the initial CMD0 (software reset) and CMD8 (check voltage range) commands were running successfully, but that the ACMD41 (initialise card) command was getting stuck. Probing with an oscilloscope showed that the card never returned a valid response, with its MISO pin remaining high.

There were a few things that I could think of that might cause the initialisation to fail. Bad power was one possibility. I had also read that the card should be clocked at between 100 and 400kHz during the initialisation phase - something that I had overlooked while designing the hardware, I was using the 9.5MHz clock at all stages of the initialisation - although this was rendered less likely to be the problem by the fact that the card responded to other commands. Finally, I was concerned that the "speedup" capacitors in my level shifting circuits might be causing excessive spikes on the data lines - I had initially fitted 100pF caps here, which were causing a considerable amount of overshoot.

I first decided to reduce the capacitors down to 20pF. This gave an acceptable waveform on the oscilloscope, though it was still hard to know what the waveform would be like once the probe was removed, as the input capacitance of the probe would form a divider with the speedup cap.. I then tried reloading the driver, and to my surprise, the disk was detected successfully!

[First successful test of the card]
First successful test of the card.

However, this was not the end of the story. I found that the interface would work with the extender card, on the end of a ribbon cable, but not when inserted directly into the slot. Again, this could have been due to noisy power supply rails, or due to radiated interference from the motherboard below, which might have required a shield to have been fitted. However, on a hunch, I tried removing the speedup capacitors altogether, and found that the card would then function when inserted directly into the slot.

[Card installed in slot]
Card installed in slot.

I then set about measuring the performance of the drive. Read performance initially came out 52.7KiB/s, already a considerable improvement on the parallel port adapter hardware, but still falling short of what I believed the hardware to be capable of. I therefore set about trying to optimise the code.

The basic "read block" function was as follows:

void rcvr_mmc (BYTE DOSFAR *buff, UINT count)
	{
	outportb(0x302, 1); 
	
	inportb(0x300); 
	
	while (--count)
		{
		*buff++ = inportb(0x300);
		}

	*buff = inportb(0x301);
	}

This was compiled into the following:

     e1d:	55                   	push   bp
     e1e:	8b ec                	mov    bp,sp
     e20:	b0 01                	mov    al,0x1
     e22:	ba 02 03             	mov    dx,0x302
     e25:	ee                   	out    dx,al
     e26:	ba 00 03             	mov    dx,0x300
     e29:	ec                   	in     al,dx
     e2a:	eb 0d                	jmp    0xe39
     e2c:	ba 00 03             	mov    dx,0x300
     e2f:	ec                   	in     al,dx
     e30:	c4 5e 04             	les    bx,DWORD PTR [bp+0x4]
     e33:	26 88 07             	mov    BYTE PTR es:[bx],al
     e36:	ff 46 04             	inc    WORD PTR [bp+0x4]
     e39:	ff 4e 08             	dec    WORD PTR [bp+0x8]
     e3c:	8b 46 08             	mov    ax,WORD PTR [bp+0x8]
     e3f:	0b c0                	or     ax,ax
     e41:	75 e9                	jne    0xe2c
     e43:	ba 01 03             	mov    dx,0x301
     e46:	ec                   	in     al,dx
     e47:	c4 5e 04             	les    bx,DWORD PTR [bp+0x4]
     e4a:	26 88 07             	mov    BYTE PTR es:[bx],al
     e4d:	5d                   	pop    bp
     e4e:	c3                   	ret   

It can be seen that the inner loop first loads the port address, then reads the data byte from the SD card. Next, ES:BX is loaded with the destination data pointer (from [BP+4]), and an indirect store of the data byte from AL to this address is performed. Then, the data pointer is incremented, and the loop counter (count, at [BP+8]) is decremented via an indirect access, loaded, tested, and a conditional jump is made.

As a first step at optimisation, I converted the C code into the equivalent inline assembly, to allow it to be modified more easily. It took a bit of fiddling around to work out the inline assembly syntax for Turbo C but, aided by a PDF copy of the manual, I managed to figure it out. Then, the easiest thing to do to reduce the size of the inner loop was moving loading of the port address into DX outside the loop, since this only needs to happen once. (After each modification, I recompiled, tested that the code still worked on the target machine, and checked its speed. This entailed much swapping of floppy disks, but guarded against the introduction of errors - an important port, given that I was fairly unfamiliar with x86 assembly language.)

The next optimisation was to change the loop counter from a stack variable to a register - in this case, CX. Being unfamiliar with the calling convention, I added a save/restore for this register, but as these instructions are outside the loop body, they do not influence performance significantly. This change brought the read speed up to 74KiB/s - already a worthwhile improvement.

I then looked to see if a similar optimisation would be possible with the data pointer. This is complicated by the fact that this is a "far" pointer, composed of both a segment and offset. I was initially unsure what would happen if the offset wrapped around within the loop, and thought it might be necessary to somehow adjust the segment before entering the loop, so that the buffer was contained entirely within the one segment. However, further examination of the code generated by the compiler showed that only the offset portion of the pointer was being incremented, so I concluded that it was safe to assume that the offset would not wrap around. I therefore the code to directly increment the BX register, and moved the 'les' instruction out of the loop body.

On testing this modification, it failed to work, and I immediately assumed that I had missed some subtle aspect of the segmented addressing system. But the cause of the fault was actually much simpler - I had forgotten to update the code that performed the final read at the end of the function. This was still using the copy of the pointer on the stack, which was no longer being incremented. After fixing this, the code functioned correctly, and the read speed was up to 116KiB/s.

I suspected that the overhead of the operating system and C stdio library was now a significant factor (I had obtained inconsistent result performing speed tests with "copy file.dat NUL:", and so had written a simple C program to read data in from a file, then throw it away). Sure enough, examining the SCK line with an oscilloscope showed a raw cycle time for reading a byte of around 5us, corresponding to a raw read speed of about 200KiB/s. I converted my test program to use the open/read functions instead of the higher level fopen/fread, but this produced only a marginal improvement in speed. However, increasing the read buffer from 1KiB to 16KiB increased the reported read speed to 160KiB/s.

I then went back to the driver code, to look for further opportunities for optimisation. One possibility would be to combine the data pointer and loop counter variables. However, I eventually decided that this was infeasible. It would be necessary to adjust the segment and offset values so that the loop terminated just as the offset wrapped around. (Comparing the pointer directly to a length value would necessitate a separate comparison instruction, rather than just a test for zero, as so would negate any savings). However, to make this adjustment, the data buffer would need to be aligned to a multiple of 16 bytes, which would not be easy to guarantee.

I then looked at the possibility of unrolling the loop. The jump instruction is relatively expensive, at 16 cycles, so it would be well worthwhile to split this over multiple bytes. Reading through the instruction set, I also noticed that the "store string" instruction was only 11 cycles, versus 16 for a mov + inc. Furthermore, the "store string word" instruction was only 15 cycles. If the stores to memory were interleaved 2:1, this plus a register-register move and an exchange (necessary to preserve the correct byte order) would come in at only 21 cycles, less than two "store string byte" instructions. With this optimisation applied, and the loop unrolled 8:1, read performance as reported by my C program was up to 253.8KiB/s.

Up until now, I had been restricting myself to the basic 8086 instruction set. However, in the instruction set reference that I was using, I had noticed the 'ins'/'outs' instructions. These instructions, flagged as 80186+ only, will read/write from an I/O port, store/load to memory, and increment the memory pointer all in one operation. I saw some reference to the V30 chip supporting 80186 instructions, so I thought it would be worth giving these a go.

It was necessary to change the compiler flags in order to get these instruction to be emitted, however, after having done so, they were found to work, and read performance was up to 305KiB/s. I had to revert back to byte-wide access, since the card's I/O port is only 8 bits wide. But despite this, there was a considerable improvement. I also tried using the 'rep' prefix, which would have allowed the inner loop to be collapsed into a single instruction, however I couldn't get this to work. Possibly, the time between subsequent accesses to the port had become too small for the hardware to handle (I had found that this was the case with back-to-back 'in' instructions.). However, given that the loop was already unrolled, the gains from using 'rep' would probably not be as high as otherwise expected.

While further optimisation would probably be possible. I decided that I was starting to approach the point of diminishing returns. I therefore turned my attention to the write function.

While read performance is an obvious primary target for optimisation, since reading will generally occur more frequently than writing, write performance is still an important criterion. The easy gains that I had found with the read function made me think that similar gains could probably be had while writing. The logic would also be somewhat simpler, due to the lack of the need for pipelining.

Firstly, it was necessary to determine the existing write performance of the driver. Accordingly, I modified my test program to give a write test mode, which gave a figure of 47KiB. Applying the same sequence of optimisations to the write loop brought the throughput up to 222KiB/s.

Of course, when optimising the read loop, the consequences of a logic error are not particularly severe - the code simply won't work. An error in the write code, however, could be much more serious. At one point, I was accidentally writing data from the wrong memory segment, which ended up completely destroying the filesystem on the SD card. (This probably would have been a good application for unit testing of the write function if I had been bothered to set it up, though this would probably have to have been done in some sort of emulator.)

Fortunately, nothing of value was lost, and my experience so far has been that, other than this error, the SD card interface has been quite reliable in terms of data integrity.

[Card in operation]
Card in operation.

With this done, I considered the project successfully completed. It has certainly increased the versatility of the old machine, and now I am on the lookout for interesting vintage software that I can run on it. With all the development tools that I have installed so far, perhaps I will have a go at a serious programming project using the machine itself.

As always, I do have a few spare PCBs, and you are welcome to one if you'd like to try building one of these cards, though I realise that this project has a rather limited appeal, as there can't be very many of these machines around anymore at all. Of course, the basic circuit would be suitable for fitting an SD card interface to almost any vintage IBM-compatible PC, whether to replace a failed mechanical HDD, or for data exchange with a newer machine. I don't know how much there is out there in terms of ISA-compatible SD card interfaces, but if there is sufficient interest, I might look at laying out a version of the PCB that would fit in an ISA slot, and releasing the software to suit.

And finally, a few more resources on the NEC Multispeed: After going to all of the effort of reverse engineering the interfaces, I did actually find a copy of the service manual! This does contain a few useful nuggets of information, including details on the E0 port for bank switching the ROMs, pinouts of the programmable logic arrays, and some timing diagrams. However, it is notably silent on the pinout of the modem card connector, despite having pinouts for all of the other connectors. So the reverse engineering effort was not totally wasted, though I suppose I could have identified the pins via continuity measurements to the identified signals on the connector to the disk controller board, had I known.

The NEC Multispeed Information Vault (appears to require a google account, unfortunately)

Advertising brochure

TV ad (seek to 24:31)

1987 TV program about laptop computers, featuring the Multispeed. A few interesting facts here, apparently battery life was the original reasoning behind omission of a hard drive (although I suspect cost would also have been a significant factor). I bet the NEC engineers would have been pretty keen on flash memory if they could have got it (at an affordable price)! Also, it seems the ROM sockets were intended for use by third party vendors rather than the end user, which partly explains the lack of technical documentation.

Back to Part 2 - Reverse Engineering.

Up to Introduction.


Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

loopgain.net