I decided to get back to work and fix my sprite clipping for my object rendering. It turned out to be a simple fix and was just messing up when crossing a bank.

Once I got this done I went looking for more levels with lots of objects so I could test out the level rendering and get a good idea of overall performance. When I was doing levels, I used to love adding in lots of water for decoration – usually right across the level, so I went looking for one of them.
However, it looked like I wasn’t converting the levels properly, as there was no water to be seen – or at least very little. I tried several of my old levels, but they were all the same, loads of water removed. Had I missed something?

After much investigation, it turns out that Windows Lemmings doesn’t have all the water that the Amiga one did! What the hell!?!? I quizzed Russell Kay (who wrote it Windows Lemmings), and he told me they’d removed a lot of the decorative items for performance reasons. Damn…

This was a mixed blessing as sure it wouldn’t look 100% like the Amiga one, but at the same time, it meant I’d be able to keep performance up quite a bit. Oh well…. it wasn’t like I could do anything about it.

Speaking of performance…. I’d been using the new ZXNext instructions a lot in my rendering code, so I suddenly started to wonder how I’d fair if I used only the original Z80 instruction set. I was in for a shock that’s for sure, as the extra code required would push rendering times up massively.

You can see from the image above the huge speed boost the new instructions – in particular LDIX, gives 256 colour (Layer 2) rendering code. The image on the left uses a very standard rendering loop, load a value from (HL) into A, test to see if it’s 0, branch if it is, other wise, store in (DE), then INC HL and INC DELDIX does this pretty much in one instruction but has the added advantage you can compare A to any value, not just 0.

There are several new instruction aimed at giving game devs more tools to speed up their code, some of them are real beauties.

Final new Z80 opcodes on the NEXT (V1.10.06 core)
======================================================================================
   swapnib           ED 23           8Ts   A bits 7-4 swap with A bits 3-0
   mul               ED 30           8Ts   multiply D*E = DE (no flags set)
   add  hl,a         ED 31           8Ts   Add A to HL (no flags set)
   add  de,a         ED 32           8Ts   Add A to DE (no flags set)
   add  bc,a         ED 33           8Ts   Add A to BC (no flags set)
   add  hl,$0000     ED 34 LO HI     16Ts  Add $0000 to HL (no flags set)
   add  de,$0000     ED 35 LO HI     16Ts  Add $0000 to DE (no flags set)
   add  bc,$0000     ED 36 LO HI     16Ts  Add $0000 to BC (no flags set)
   outinb            ED 90           16Ts  out (c),(hl), hl++
   ldix              ED A4           16Ts  As LDI,  but if byte==A does not copy
   ldirx             ED B4           21Ts  As LDIR, but if byte==A does not copy
   lddx              ED AC           16Ts  As LDD,  but if byte==A does not copy, and DE is incremented
   lddrx             ED BC           21Ts  As LDDR,  but if byte==A does not copy
   ldpirx            ED B7           16Ts  (de) = ( (hl&$fff8)+(E&7) ) when != A
   ldirscale         ED B6           21Ts  As LDIRX,  if(hl)!=A then (de)=(hl); HL_E'+=BC'; DE+=DE'; dec BC; Loop.
   mirror a          ED 24           8Ts   mirror the bits in A     
   mirror de         ED 26           8Ts   mirror the bits in DE     
   push $0000        ED 8A LO HI     19Ts  push 16bit immidiate value
   nextreg reg,val   ED 91 reg,val   16Ts  Set a NEXT register (like doing out($243b),reg then out($253b),val )
   nextreg reg,a     ED 92 reg       12Ts  Set a NEXT register using A (like doing out($243b),reg then out($253b),A )
   pixeldn           ED 93           8Ts   Move down a line on the ULA screen
   pixelad           ED 94           8Ts   using D,E (as Y,X) calculate the ULA screen address and store in HL
   setae             ED 95           8Ts   Using the lower 3 bits of E (X coordinate), set the correct bit value in A
   test $00          ED 27           11Ts  And A with $XX and set all flags. A is not affected.

New instructions like MULMIRRORPIXELAD,PIXELDN are ones lots of game devs would have killed for back in the day. With the spectrum screen being so tricky, the new instructions like pixelad and pixeldn are a god send for developers, taking away one of the major pains and slow downs they had in rendering.

So after getting a warm fuzzy feeling at my rendering speed, I decided to try and get the SID chip working. This was before we lost it obviously. I decided to use the reSID library and loaded the DLL on startup. But I just could not get it working….

This is an image of a single channel playing a pulse wave – so it should be a simple square layout, but as you can seem, the waves are not only very thin, but have odd little bumps on the top, and that odd block missing. I fought with this for a while, quickly getting nowhere, so eventually gave up and decided to stick with my own SID code from my C64 emulator. It’s not great, but does sound okay, and does work – which is always a plus.

All this was working towards a new major CSpect release, to try and get as close to the actual machine as I could. This would also include the new 3xAY chip, and DMA.

DMA (Direct Memory Access controller) was something I was really wanting, as it would speed up my Lemmings rendering code hugely. When I copy the screen each game cycle, it can take 2-3 frames just for that copy as it needs to copy 38K each game tick, which for a spectrum, is a hell of a lot. DMA runs at the same speed as the CPU clock, and at 4T-States per byte copied, is a massive boost in performance. But first, I needed to get it into CSpect, and that meant understanding how it worked – beyond what most coders would care about.

I spent a while hunting for more info on the DMA chip, and finally found the datasheet for it, which you can find on an earlier blog post ( DMA Datasheet ). It’s a little confusing, but with the help of Victor I stumbled through creating the state machine inside CSpect. The DMA is basically a set of registers that you set by doing a stack of OUTs, with the first byte of the instruction telling the DMA controller what registers follow. Once I had this in place, Victor gave my little DMA sample code a once over, testing it on the real hardware, and I was then able to also get something running locally.

DMA has a few modes, it can either increment, decrement or not move the source or destination, and it can go to RAM or a PORT. So I started off by trying to DMA a stack of data to the border and see what happens…

After a bit of fiddling around, I finally got the DMA working. I had to rearrange my CSpect processing loop as the DMA locks out the CPU, but I still needed the screen to render each scanline based on the number of T-States the DMA was taking from the machine overall. It’s certainly not perfect, but it doesn’t have to be. CSpect is all about making it easy to code for the Next, not about making it pixel perfect.

Next I wanted to do a memory to memory copy, so I grabbed a screen show and DMA‘d it up and got the image below-  this was at 28Mhz…

It’s a shame we’ve lost the 28Mhz, as it’s ballistically quick. Here you can see I can copy a normal spectrum screen in about 16 scan lines – although this is probably without the old memory contention in there, but no matter what, it’s still incredibly quick. That’s not to say 14Mhz isn’t quick as well mind, and the speed up it gives me for my Lemmings screen copy code is well worth the effort. Here’s the little DMA program that copies the screen above (which is included in the CSpect archive)…

DMA db $C3   ;R6-RESET DMA
 db $C7   ;R6-RESET PORT A Timing
        db $CB   ;R6-SET PORT B Timing same as PORT A

        db $7D    ;R0-Transfer mode, A -> B
        dw ScreenDump  ;R0-Port A, Start address    (source address)
        dw 6912   ;R0-Block length     (length in bytes)

        db $54    ;R1-Port A address incrementing, variable timing
        db 2   ;R1-Cycle length port A
    
        db $50   ;R2-Port B address fixed, variable timing
        db $02    ;R2-Cycle length port B
    
        db $C0   ;R3-DMA Enabled, Interrupt disabled

 db $AD    ;R4-Continuous mode  (use this for block tansfer)
        dw $4000  ;R4-Dest address     (destination address)
    
 db $82   ;R5-Restart on end of block, RDY active LOW
  
 db $CF   ;R6-Load
 db $B3   ;R6-Force Ready
 db $87   ;R6-Enable DMA

With the DMA now running in CSpect, I thought I’d give some of the old DMA demos a go, see how compatible I am.


It was pretty cool seeing these demos “just work”, and showed my DMA code was working well.

I was about to take a break as I headed out to Orlando with the family, but that didn’t stop me having a little fun on the plane as we headed out…

It would be a few months before I pick up any of this again as work got busy, and deadlines loomed…