[ Home | News | Contact | Links ]


Construction and Testing

Assembling the PCB

Because of the complexity of the design, I elected to construct and test the board in stages. After installing the power supply regulators, I fitted the ATMega microcontroller, and confirmed that I was able to program it. I was not sure if my ancient programming adaptor would work with a 3.3V supply, but it appeared to function without problems.

The next stage was to get the FPGA up and running. I desoldered the chip from the development board using a hot air tool, and fitted it to my own PCB. I also swapped over the serial EEPROM used for the FPGA configuration, and fitted the necessary connectors to allow programming using the dev board. On applying power, the CDONE LED came on, indicating that the chip was functioning correctly, and I was able to program it using the iCEcube software. (I discovered later that the software reports that the programming process succeeded, regardless of whether it actually worked or not! Fortunately, there was nothing wrong with my setup, other than an unplugged USB cable).

The second large high-density chip, the ADC, was next on the list to populate. With this chip installed, it was necessary to check that it was working correctly. Although I could not check its video data and sync outputs without the DAC installed, I could at least check the I2C communication with the AVR. Unfortunately, I had left the RS-232 driver chip off my parts order, and until this arrived the only output from the system was a couple of LEDs. I was however able to read out some values from the chip, check that they matched the power-on defaults from the datasheet, and turn on an LED.

After passing these tests, I went ahead and installed the rest of the components onto the PCB. Now that the DAC chip was fitted, I could go ahead with further testing. To start with, I wanted to get the system operating in 'pass-through' mode, without any signal processing. This would check that the ADC, FPGA, and DAC were all full operational.

[Assembled PCB - top side]
Assembled PCB - top side.

[Assembled PCB - bottom side]
Assembled PCB - bottom side.

ADC Testing

Before I could try this, it was necessary to configure the ADC. This part has about 64 8-bit registers to control its operation. I went through the datasheet and chose the most appropriate values for the chosen video mode. (Initially, the video signal was derived from an old laptop, running at 640x480. This had a horizontal scan frequency of 31.5kHz, a vertical scan frequency of 60Hz, a HTotal of 800, and a VTotal of 525, for a dot clock of around 25MHz.) With the chip configured via the I2C port, the regenerated dot clock could be observed at the test point. The FPGA was then configured to transfer the video data directly from the inputs to the outputs, without modifying it. At this point, an image was visible on the monitor connected to the output, however it was very dark and distorted. Some improvement was realised by connecting the shell of the input connector to ground (this connection had been omitted in the original PCB layout), however the image was still very dark.

A greyscale gradient test pattern was created on the computer, and the input and output video waveforms were observed. The input waveform was a clean sawtooth as expected, but the output showed the bottom half of the brightness range being clipped to black. This suggested a problem with the DC restoration circuitry in the AD9983.

The chip can automatically derive a clamp waveform from the supplied sync signals, and adjusts the DC level of the video signals during the back porch, when the voltage for full black is being transmitted. There is also an automatic offset adjustment for the three ADCs. Adjusting the timing and configuration parameters for the clamp circuitry did not make much difference. (Fortunately, the parts for the RS-232 interface had arrived by now, so it was possible to adjust the parameters in real time.) I therefore decided to try supplying an external clamp signal.

The only available signal source was the horizontal sync pulses. For a composite video signal, these pulses would be unsuitable, as the sync level and black level differ. However, for an RGB signal with separate sync, I thought it might be possible to perform the clamping during the sync pulse, instead of during the back porch. This would save having to find a way to delay the sync pulse before feeding it to the clamp input.

Therefore, I connected the clamp input directly to the horizontal sync signal, and reprogrammed the chip to use the external clamp input. The results were immediately apparent, as the image came up to full brightness.

One problem with the image remained - there appeared to be a sideways displacement of every second line, causing a double image. Probing the regenerated horizontal sync pulses showed some peculiar behaviour - the sync pulses were appearing in pairs, with a long delay between them for the odd lines, and a shorter delay between the even lines. Examination of the dot clock also showed that the frequency was not constant, suggesting that the PLL was jumping between two different frequencies. The PLL parameters were checked, and it was found that the bias current setting was incorrect for this video mode. Correcting the current gave stable sync pulses, and a good quality image.

It was thought that the PLL instability might have been affecting the clamp circuitry, so the internal clamp was tried again, but this still produced dark pictures, so the external clamp signal was retained.

At this point, the quality of the image was judged to be sufficient, and work could commence on the signal processing. Some tests were then conducted using the FPGA. A ramp signal was generated using the dot clock and an 8-bit counter, locked to the horizontal sync. The filtering algorithm was also tested on one of the channels.

Framebuffer and Sync Testing

Before attempting to process the image, it was necessary to check that the reference logo image in RAM could be locked to the incoming video signal. As a first step, the timing signals to the address counters were inspected. However, no signal was evident here. Further investigation showed that the analogue multiplexer, used for routing the timing signals, had been miswired, with the outputs and control inputs interchanged. (The Eagle symbol for this part is rather misleading, with the control inputs opposite the signal inputs, and the signal outputs hidden at the bottom).

Unfortunately some of the connections were hidden under the multiplexer chip. Therefore, the chip was removed from the board, the appropriate tracks were cut, and the correct connections were made using fine enamelled copper wire. After reinstalling the part, the clock signals were found the pass through to the address counters, and the outputs of the counters were switching as expected with decreasing frequency, all locked to the sync signals.

It was then possible to attempt to display the contents of the RAM on the display. The FPGA configuration was modified to allow this, with one byte of the RAM data being fed to one of the video channels of the DAC. However, it soon became apparent that the RAM was filled with 0xFF at powerup. This constant value was removed by the DC restoration circuitry in the display monitor, and no output resulted. To remedy this, the signal was gated in the FPGA, producing a coloured bar on the screen.

To fully check the picture synchronisation, it was necessary to provide a more complicated test image than this, and a method of getting this test image into the RAM was also needed. It was decided to attempt to capture the incoming video signal and store it.

To do this, several events must be made to occur. The /OE and /WE lines of the RAM, which are controlled by the microcontroller, must be toggled. Also, the data bus lines from the FPGA must be changed from inputs to outputs, and the data from the ADC duplicated on these pins. The whole process would be initiated by the microcontroller, in response to a signal from one of the pushbuttons, and a means of commanding the FPGA to switch modes concurrently with the change in the RAM control lines was needed.

Although an SPI-style interface was envisaged for eventually controlling the FPGA, it was decided to initially use the SCK pin as a simple binary mode control input. Reconfiguring the direction of the FPGA data pins turned out to be quite easy - all that was required was to change the 'input' keyword to 'inout' in the Verilog code, and assign a 'high impedance' value to the outputs when they were not in use.

There were a number of uncertainties surrounding the timing of the various signals controlling the RAM chip, as I had not performed a full timing analysis prior to designing the circuit. I was therefore quite surprised when, on pressing the 'store' button, a flawless, perfectly locked copy of the input signal (in green!) was displayed!

I then realised it would be quite easy to implement a full colour frame store. Although the 16-bit RAM could not directly accommodate the 24-bit video signal, it would be quite easy to rearrange the data into a 16-bit 5:6:5 R:G:B format. After making the necessary changes to the FPGA configuration, the frame store now functioned in full colour. The use of programmable logic makes this type of change much easier to implement, as a discrete circuit would have required much rewiring of data buses to make the same change.

To further check the synchronisation of the incoming and stored images, it was thought desirable to combine them both into a composite image. While I could have just taken one channel from the stored image, and two from the input, I decided to take an average of the two sources for each channel, thereby producing a crossfade between the input and stored image.

[Crossfade between live and stored image]
Crossfade between live and stored image.

Having shown that the video framebuffer was functioning correctly, it was then necessary to find a means of getting data in and out of the framebuffer under control of the microcontroller. While designing the circuit, I had intended that all communication between the FPGA and micro would be via an SPI bus, due to a shortage of available pins on the FPGA. Having the SD card connected to the same bus would hopefully allow data to be quickly transferred between the card and the FPGA.

I decided to use a register file design, similar to that of the ADC. Transmissions would commence when the 'chip select' pin (pin 49) was pulled low, and would simply consist of an address byte, followed by a data byte. The registers I decided on were as follows:

0x00: Mode
This would control the direction of data transfer between the RAM and the FPGA; whether the address counter control signals were sourced synchronously from the incoming video signal, or from the microcontroller via the SPI interface; and would determine the overall operation of the chip.
0x01: RAM write register
Data written to this register would be presented to the RAM on the data bus, alternating between the low and high byte. Every second byte written would pulse the horizontal address counter clock.
0x02: RAM read register
Data read from this register would be sourced from the data bus. Again, the high and low bytes would alternate, and every second read would pulse the clock.
0x03 H Gate Start MSB
0x04 H Gate Start LSB
0x05 H Gate End MSB
0x06 H Gate End LSB
0x07 V Gate Start MSB
0x08 V Gate Start LSB
0x09 V Gate End MSB
0x0A V Gate End LSB
Outside of the active picture area, the video levels need to be at the black level, to ensure the DC restoration circuit in the monitor functions correctly. Rather than relying on the input and mask signal to produce this, a gating window over the picture is defined, outside of which the data values can be set to zero. This function could also be used for clipping widescreen transmissions, as there are sometimes annoying artefacts transmitted at the boundary of the picture and top and bottom black bars. These values could have been defined in units of four pixels, to fit them into single bytes, however it was decided to split them for maximum flexibility.
0x0B: X Logo Window Start MSB
0x0C: X Logo Window Start LSB
0x0D: X Logo Window End MSB
0x0E: X Logo Window End LSB
0x0F: Y Logo Window Start MSB
0x10: Y Logo Window Start LSB
0x11: Y Logo Window End MSB
0x12: Y Logo Window End LSB
As described in the software implementation, although the logo cancellation is self-masking, the filtering needs to extend past the boundary of the logo. This window defines the region where the filter will be applied.

The following values were defined for the mode register:

0: Bypass
The incoming video signal is passed through unchanged.
1: Process
The incoming video signal is combined with the data in RAM according to the logo cancellation algorithm.
2: Capture
The incoming video frame is written to the RAM.
4: Read capture
The data in the RAM is read out over the SPI interface under control of the microcontroller
6: Store template
Data input via the SPI interface is stored in the RAM.
8: Display framebuffer
The contents of the RAM are displayed on the video output, for testing.

Bit 1 of the mode byte controls the I/O direction of the data bus, and bit 2 selects the horizontal address counter clock between the dot clock of the incoming video signal, and an internally generated value linked to the data shifted in/out via the SPI interface.

Implementing the SPI interface in the Verilog code was actually considerably more difficult than the logo cancelling function, due to its significant use of sequential logic. I eventually abandoned the separate read and write registers for the RAM data, and simply made any bytes transmitted after the initial address and data bytes be destined for the RAM.

As the control lines for the RAM and address counters are shared between the FPGA and microcontroller, storing data is a joint effort between the two chips, and each must play their part at exactly the right times for the operation to succeed. With the SPI code for the FPGA written, I then worked on the overall control code for the AVR.

Setting the chip to shift in a test pattern appeared to work, but on selecting mode 8, the displayed image appeared to be corrupted. I changed the FPGA code to generate a test ramp pattern, clocked by the SPI interface, but the corruption persisted. Examination of the control signals using oscilloscopes (in the absence of a logic analyser, or even a four channel scope, I had to resort to two oscilloscopes with their trigger inputs tied together!) showed that a bit was being dropped out of each word. This was traced to an error in the Verilog code, which was running the data presentation and clocking on a cycle of 17 bits, instead of 16.

After rectifying this, the test ramp could be seen on the monitor, but not at the correct brightness, as it did not have a blanked section for clamping. Zeroing out the first section of the line was attempted, but the problem persisted. Examination of the video waveform showed that the blanking was present, but not at the correct location. It appeared that the stored data was not correctly synchronised, even though the individual lines were synchronised with one another.

After much probing of the signals, and reading datasheets, it was realised that the address counters were of the synchronous reset type, and required a rising edge on the clock input while the reset input was low, in order to reset them. This could have been fixed by swapping the counter chips for the asynchronous reset '161 types, but as this would have meant several days' delay while the new parts came in, the code was rewritten to work around this characteristic. It was necessary to write some dummy data bytes to clock the horizontal counters at the appropriate time, but this approach was successful overall. It was found that the blanking was still not correct during the horizontal sync pulse as the address counters were reset at this time, so until the gating window could be implemented, the sync pulse was used as a blanking enable signal.

Now it appeared that the data transfer to the RAM was finally functioning correctly. A framebuffer containing test patterns was implemented in the microcontroller, which was then mirrored to the video RAM. (Due to the limited amount of RAM on the ATMega32, the resolution of the framebuffer was limited to 32x32pixel blocks, however this was sufficient to test the data transfer.)

[Framebuffer test]
Framebuffer test, using 32x32 pixel blocks.

This appeared to work without problems. The next step was to perform a test at full resolution, which would require a method of transferring a large amount of data from an external source into the microcontroller. The easiest way to do this would probably have been to use the serial port, perhaps using an encoding such as Xmodem or Base64. However, even at 115200bps, it would have been quite slow. I therefore decided to jump straight to using the SD card, as I would need to get this working eventually anyway.

SD Card Programming

I managed to find an SD card library for the AVR, and I patched this into my code. However, I was unable to get the card to initialise. At first, I thought this might have been due to a conflict with the FPGA chip on the SPI bus, however I eventually discovered that I had miswired the micro SD card socket (The micro cards have a different pinout to regular SD cards). I rectified this problem, cutting the traces and making the connections using very fine wire. I then retried the program, but still could not get the initialisation to work.

On examining the data transfer with a DSO, and tracing through the code, it could be seen that bidirectional communications had been established, but the card was not providing the anticipated response at one point in the initialisation sequence. Further research showed that 'high capacity' cards, such as the 8GB one I was using, require a different initialisation sequence, and are therefore incompatible with older reader hardware. Although smaller cards will typically use FAT16 instead of FAT32, this is an additional incompatibility below the level of the filesystem (FAIL!).

Rather than try to modify the existing library, I set out to find another one that was compatible with SDHC cards, and that would also support FAT32. The next library I tried appeared to work in simple tests, but did not support writing to the card. Furthermore, examining the code showed it to be of rather indifferent quality. At this point, I almost settled on writing my own, but then I found Roland Riegel's library. It supports all types of cards, reading and writing, FAT16 and FAT32, and partitioned or 'superfloppy' format. The only disadvantage is the code size - once I had linked it into my existing code, I had to do a fair bit of pruning to get everything to fit into the 32KiB of code space on the ATMega32.

Using this code, I was able to read and write a test file without any problems. I then modified the RAM writing code to work with 512-byte blocks, to match that used by the filesystem driver, and attempted to load an image into RAM. This did not work however, with the load process crashing, always at the same point, about halfway through the 2 MiB file.

This was a significant problem, as debugging the filesystem code would not be easy, especially on the AVR platform using only the simulator. However, the problem appeared to be in the FAT code, rather than the SD card I/O routines, so it was decided to try debugging it on a PC.

An image of the card with the problem file on it was made, and the code was copied into a new directory. All the AVR-specific parts were then removed, and the SD card block I/O routines were stubbed out with file I/O calls. Running this code did not result in a crash, but the data became corrupted at the same point in the file.

It was then necessary to research the details of the FAT32 filesystem. Fortunately, it is not as complicated as I first thought, essentially there is a table of 32-bit cluster numbers, forming a linked list of the clusters used by the file. The code was following this linked list, and always getting lost at the same point. Further research showed that, although the filesystem is named 'FAT32', only 28 of the cluster number bits are significant. (I had heard this factoid many years ago, and it was now of immediate practical significance!) Examination of the disk image with a hex editor (note: KHexedit tries to load the entire file into memory, so I had to slice off the first few meg to look at) showed that the failing cluster entry has the two MSBs set.

According to the specification, these bits should be ignored, however this was not happening, and the code was jumping off into the unknown. It was a simple matter to mask off these bits, and after doing this, the load function worked perfectly. (The code change is simply adding the line 'cluster_num &= 0x0FFFFFFF;' at line 453 of 'fat.c'; I was unable to contact Roland regarding incorporating this change) Pushing this change through to the AVR implementation allowed images to be successfully loaded from the card into the framebuffer.

At this point, it was necessary to decide on a suitable file format for storing the images. While they could by stored directly as they appeared in the framebuffer, it was considered desirable to have them in a format that could be read directly by an image editing program, rather than needing a separate converter. To allow direct memory transfers into the framebuffer, the format would need to have a few specific characteristics. The data presentation would have to be 16 bits per pixel, in two channels, and the header, if any, would need to be a multiple of 512 bytes. Also, an image size of 1024x1024 pixels must be supported.

A quick look at some of the common image file formats failed to find one that matched these criteria. However, it was found that the GIMP image editor supports 'raw mode files' with the parameters being set during load/save by the user. Using an 8-bit greyscale format with an alpha channel allowed the native framebuffer format to be read and written directly. Unfortunately, decomposing such an image into Y and alpha channels is a little convoluted. Some testing was done using this format, however it eventually proved too inconvenient, and a converter was written to translate between the raw files and the PPM image format. The m and c channels from the raw image are mapped to the red and green channels of the PPM file, with the blue channel being ignored.

A number of annoying bugs were introduced when writing the translators, but these were eventually rectified. The converters were tested by converting from PPM to raw and back again. Provided that the blue channel of the original image was zero, this resulted in an identical file. (Unfortunately the GIMP writes a comment in the header of PPM files, so the converter was made to add an identical comment to facilitate comparison of the files using 'diff'.)

With the converters proven, it was then possible to test the file I/O on the logo canceller board properly. It must be possible to store an image into the RAM, and then read it out again without corrupting it in any way. Initial attempts to do this resulted in corrupted images, with an offset of one bit being introduced at some point in the process. The FPGA code was adjusted to fix this, resulting in correct colours, but the saved image was still displaced by two pixels relative to the original.

It was not apparent what was causing this, so the RAM reading code was modified to discard the fist four bytes, and read an extra four at the end of the line. The address counters will wrap around, so a complete line will still be read out. This then gave an image identical to the original. The test image did have black borders, which would not reveal problems at the edges, however any errors in these regions should not affect the displayed image.

Live Testing

The next stage of the development was to test the module using a live television image. The RGB output of a digital TV set top box was connected, and the displayed image observed. The picture was visible on the monitor, however there were many light and dark bands across the image, along with fluctuations in overall brightness, suggesting a problem with the DC restoration circuitry. It was possible that this was due to the video timings being different from those of the laptop computer.

Investigation with an oscilloscope showed that the sync polarity was inverted compared to the computer. This should not have made any difference, as the chip should auto-detect the polarity, however to confirm this, a 74HC14 was temporarily wired in to invert the polarity. This did not cause any significant change in the image. A small spike was evident on the video waveform, and various pulse shaping networks were tried on the clamp signal to delay the clamp period away from this spike, however a good image could still not be obtained. The display was now similar whether the internal or external clamp signal was used, and it was decided that the earlier fix had just masked the real problem.

It appeared that the DC offset in the ADC was incorrect. The AD9983 can either clamp to 0V, or to mid-scale. However, the zero code output was corresponding to an input level of around 400mV. This could be brought down somewhat with the offset control registers, but not all the way down to ground level. Examination of the signal from the laptop in the working configuration showed that the black level was being clamped to around 400mV, though I could not establish why this was occurring. It would make sense for a composite video signal if the sync pulse level was clamped to ground, though the AD9983 documentation specifically states that clamping should occur during the back porch, not in the sync pulse.

It was clear that I would have to work around this offset, as I could not remove it using any of the internal registers. Adding an external clamp circuit with a 400mV reference would have been one possibility, though a rather messy one. Another option would be to use the mid-span clamp, by setting register 0x18 to 0x0E. Doing this brightened up the picture, however all of the highlights were then clipped to white. After some experimentation with the offset registers (0x0B, 0x0D, and 0x0F), it was found that setting these all to 0x7F gave a picture with full dynamic range.

It is not clear why this configuration worked when all others had failed. The mid-span clamping is designed to be used with colour-difference signals, which have both positive and negative components. As a bonus, the external clamp input could be dispensed with. After some adjustment of the clamp start and duration registers, values were found which would work with both the computer and TV input signals. Some clipping of a ramp waveform was still evident on the red and green channels, but disabling the automatic offset adjustment fixed this.

With a good quality picture being passed through the system, it was then possible to look at performing the actual logo cancellation. Firstly, an image was captured from a live video feed, and downloaded to the SD card, showing that the video capture system was working. After a slight modification of the code to add serial numbers to the saved image file name, it was possible to capture some logo samples against different backgrounds, starting with the ABC1 graphic. As before, finding an image with a white background proved to be quite difficult, and I had to settle for an image of a white car.

The 'logosolve' program was modified to produce the m and c variables as outputs, instead of the logo image and alpha channel. The sample logos were then run through this program, after removing unnecessary scene detail using an image editor. However, on loading the template into the module, and enabling the cancellation function, severe picture distortion was evident, with the picture being very dark, and having incorrect colours.

It was therefore necessary to devise a 'passthrough' template image that would pass through the original video unmodified while in cancellation mode. A solid colour image was created, with the 'm' (red) channel the value determined by the solve program (136), and the 'c' (green) channel zero. Loading this image produced a mostly normal picture, but with an interesting psychedelic effect on the highlights. It was thought that this was due to the m value being greater that that corresponding to a gain of one (128). This was increasing the contrast, and as each colour channel reached its maximum in a white region, it would wrap around to zero, leaving a saturated complementary colour behind. Although this gave an interesting video effect in its own right, this was not really what I was looking for.

[Colour distortion resulting from arithmetic overflow in cancellation algorithm]
Colour distortion resulting from arithmetic overflow in cancellation algorithm

It was necessary to determine why the logosolve program was not giving an m value of 128 in the regions that should not be processed. After investigation, this was found to be attributable to the white and black levels in the sample images. Although the first sample images taken on the computer did not used the full dynamic range, it appears that this had been corrected by the time the solving algorithm was developed, and therefore the algorithm did not take into account this variation. However the files captured by the hardware module did not have black and white backgrounds corresponding to levels of 0 and 255. Rather, the black level was around 10, and the white level was about 246.

Therefore it was necessary to modify the solver to allow the black and white levels to be adjusted. Of course, it is not simply a matter of normalising the individual sample images, as this would distort the brightness of the logo. The correct transformation is to adjust the brightness and contrast of the images, such that the measured black and white levels are mapped to 0 and 255 respectively.

With this modification made, and the logo image reloaded, the displayed image still had visible distortion in the form of horizontal lines and an overall colour cast. This suggested that the video signal in the blanking interval was being transformed to some non-zero value, disturbing the DC restoration circuitry in the monitor. Though the solver was giving the correct m-value to pass through the signal unchanged, the c-value was 255 instead of 0. This should have made very little difference, given the previously described black level, and the fact that the c-value should have been summed modulo 256. However, changing the solver code to offset the c-value by one restored the correct picture signal. The cancellation performance of the system could now be evaluated.

A reasonable degree of cancellation of the ABC1 logo was evident, however a large degree of colour noise was evident in the regions where the logo would have been present. This appeared to be a repeat of the overflow problem described above. It was also thought that the poor quality of the sample logos was contributing to this. Both the black and white background logos were each derived from a single frame, so no averaging could be performed to provide noise reduction. The white background sample was particularly dubious, as it was doubtful whether the background was at full intensity across the entire logo. Finally, the issues described above with the c-value could point to a problem with the FPGA implementation of the algorithm, although the dynamic range and overflow characteristics were thought to be exactly as modelled in software.

For these reasons, it was decided to concentrate on providing better quality reference images to test the module with. A different choice of television channel could also be advantageous: a logo composed only of white pixels would only need to be captured against a black background, while the white background version could easily be synthesised.

The sample logo was changed to that for SBS 2, captured against a black background, however the colour distortions remained. At this point, one difference between the software and hardware implementations was discovered. The software operated in the YUV colour space, processing only the luminance channel, however the hardware module operated on each of the three colour channels individually. While in theory these should both give the same result, any overflow errors in an RGB system would result in severe colour distortion.

[Cancelled logo with colour fringing distortion]
Cancelled logo with colour fringing distortion.

It was therefore decided to modify the FPGA code to transform the colour to the YUV space, process the luminance channel, and then convert back to RGB. While any overflow errors present would still have to be addressed, performing the processing in this manner would eliminate the colour distortion, allowing easier evaluation of the system and hopefully aiding the search for errors. Although the colour space transforms would consume some of the resources of the FPGA, this would be compensated for by only having to process and filter one channel of data rather than three.

Integer matrix equations for the transforms were taken from the YUV article in Wikipedia. I did have some concerns with these, including whether they were truly inverses of one another, and whether I should use the coefficients for gamma corrected or for uncorrected video, but decided to try them as a starting point.

Although these equations suited the use of integer arithmetic, they did use some negative values, and required limiting at high and low thresholds, rather than a simple wraparound. I was unsure of the best way of handling this in the Verilog code, as it could depend on the representation used for negative quantities - a constraint not normally an issue with computer code. Therefore, I simply biased all the calculations into the positive domain by adding a constant, which was then removed after the limiting process.

After some time spent chasing overflow issues, I was able to produce a good quality colour picture, showing that the transforms were working correctly. It was of course necessary to add delay lines for the chrominance data to match the delay in the filter, otherwise the whole thing ended up looking like a 3rd generation VHS tape copy!

With this done, the cancellation function was working reasonably well, however there were still some random black and white pixels present at the edges of the logo. At first I thought this could be due to jitter of the sampling clock, with the logo mask not quite lining up with the transmitted image, but on switching to a solid colour test pattern from the laptop, these pixels were still present. I also noticed that, even with a static image, these pixels were flashing on and off at random. Furthermore, touching the PCB near the m- and c-bus inputs to the FPGA changed the appearance of the pixels, suggesting a possible timing or signal integrity issue.

At first, it looked like finding this problem could be very difficult, especially given the limited test equipment I had available. However, careful inspection and analysis narrowed it down somewhat. The fault was producing both black and white pixels. However, the c-constant is not capable of lightening the image, only darkening it, therefore the c-bus was eliminated. I then probed the signals on the m-bus. These appeared to be correct, subject to the limitations of the measurement setup.

To narrow the problem down further, it was necessary to find exactly which data values were causing the problem. As the edges of the logo were most affected by the problem, a new logo mask was created with solid bars, the intensities of which corresponded to the values from the edge of the logo. The intensity values were identified with on-screen text. This was used in conjunction with a greyscale intensity gradient pattern from the laptop for the input image. Unfortunately, the input image intensities could not be precisely known, due to the conversion from digital to analogue and back, but at least an approximate value could be found. The FPGA configuration was also set to use the Y value directly for the RGB outputs, and only display a black and white image.

[Intensity test pattern]
Intensity test pattern.

It was seen that the corruption was only occurring at the start and end of the logo bars, in conjunction with certain intensities of the input signal. (This was demonstrated by dragging the window containing the test pattern back and forth on the screen.) Furthermore, in some situations, incorrect colours were being introduced into the image. This was quite unexpected, as all three colour outputs should have been identical.

Investigating the colour bus outputs with an oscilloscope was somewhat inconclusive, but appeared to show that the three buses were conveying identical data. However, shorting the MSBs of each colour together resulted in a black and white image. This suggested that the data could be being clocked into the DAC at the wrong time, while the signals were still transitioning. This was a possibility, as I had not performed a full timing analysis of the system. I decided to try shifting the phase of the clock, to see what effect it had on the image.

I decided a shift of about 90 degrees would be necessary to have a significant effect, which corresponds to about 10ns at the 27MHz dot clock rate. I cut the track carrying the clock to the DAC, and inserted a few metres of wire to act as a delay line. This actually fixed the corruption completely, giving a good clear image. However, when the FPGA was reconfigured to restore the colour signal, the problem returned, and no amount of adjusting the delay line length would fix it. It was clear that the timing problem was quite sensitive to the internal arrangement of the FPGA.

I wondered if the large amount of combinational logic in the adder and multiplier circuits could be skewing the timing of the signal. My understanding was that this effect should be negligible, as the FPGA chip was capable of operating at far greater speeds than I was using it at, however it still seemed to be a possibility, as the corruption appeared to be related to carries occurring in the internal circuitry. I therefore added data registers immediately before the RGB outputs, in order to retime the signals.

This produced some improvement, although not a complete solution. After some further experimentation however, I found that adding registers at the R,G,B,M and C inputs solved the problem completely. Although these inputs all change synchronously with the dot clock, it appears that they were still not synchronised sufficiently.

Unfortunately, adding the registers consumed more of the FPGA's resources, and when I re-enabled the filter, the design became too big to fit in the chip. (I had given the filter an enable input, and setting this to 0 caused much of the filter circuitry to be optimised away, giving a false sense of the resource usage). After some careful adjustment of the Verilog code, including reverting back to the original RGB-based transform, I got the design to fit.

At this point, the basic logo cancellation functionality was fully operational, and the system was tested against a variety of video content.

[Comparison of processed and unprocessed video signals]
Comparison of processed and unprocessed video signals

[Another sample comparison]
Another sample comparison

The subjective impression of the system was that it was quite effective at reducing the visual distraction attributable to the station logos. Close up examination of the processed image showed only a faint residue of the logo remained. At a typical viewing distance, this was not noticeable at all, and I was quite pleased with the level of performance that had been achieved. I then went on to implement a few tweaks and improvements.

The original screen capture setup produced greyscale images, composed of a 1:1:1 average of the three channels. This is of course not the correct weighting, and could have caused problems with coloured images. However, the FPGA did not have the resources available to perform the correct weighted average. I therefore modified the code to produce a 16-bit colour image, in an RGB-565 format. The correct average could then be applied by the microcontroller or an external computer.

At this point it was clear that it would not be possible to implement the masking functions originally envisaged, due to lack of resources. A fixed blanking interval was defined in the FPGA for the horizontal direction, while the vertical direction was left unprocessed. Rather than use a mask window for enabling and disabling the filter, I simply enabled it whenever the m-value was greater than 128, which corresponded to the image region containing the logo.

It would have been advantageous to be able to control the intensity of the logo cancellation for optimum nulling, however this would have required another addition and multiplication stage. I decided instead to investigate applying the intensity correction before loading the logo to the RAM. This would still permit adjustment, however the update time would be limited by the speed at which the logo could be loaded from the SD card.

I also did some preliminary work on speeding up the process of loading reference logos from the SD card. This initially took around 25 seconds, which was unacceptable for a multi-channel environment. Loading the logos also corrupted the displayed image, so it would be desirable to hide it in the vertical retrace interval.

As a first step to increase the speed, the microcontroller crystal frequency was increased from 10 to 12MHz, the maximum permissible for a 3.3V supply. This provided a small, but worthwhile improvement. One side effect of this was that the baud rate for the RS232 interface had to be reduced to 57600bps to achieve error-free operation, however this was not a significant problem.

Next, the SPI data lines were examined with an oscilloscope. This showed that there was a delay of approximately 2 microseconds between each byte, both when reading in from the SD card, and reading back out to the FPGA. This was attributable to the process used for exchanging data. After transmitting a byte, the code busy-waited on the completion interrupt. Then, the next byte was fetched from the buffer, and transmitted. The time taken to test the flag and perform an indirect load accounted for the delay.

The code was modified to perform a fixed number of NOPs between each byte, and was then tuned such that the time taken to execute the NOPS, fetch the next byte, and jump back to the start of the loop was equal to the time taken to transmit the data. It was not quite possible to get the data to stream continuously - a delay of one bit time between each byte was necessary, apparently a limitation of the AVR SPI implementation - but this still cut the overall load time in half.

The possibility of performing a direct memory transfer from the SD card to the FPGA was then investigated. The hardware had been designed with this possibility in mind. The process would be as follows: The FPGA would be prepared to receive data, then its chip select would be deasserted. Then, reading of the SD card would commence, over the same SPI bus. At the moment the card started returning valid data, the FPGA would be reasserted, and it would read the data directly as it was returned from the card. There would then be no need to read the data into the AVR and then write it out again - it would simply be read for the sole purpose of clocking the card, and the actual data bytes returned could be discarded by the processor. (This would however remove the possibility of making contrast adjustments as the data passed through the micro.)

On to Conclusion.

Back to Part 2 - Hardware Design.


Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

loopgain.net