Wednesday, 6 February 2013

Paying gig

I mentioned that I haven't had any time to work on ArdweeNET lately because someone decided I was worth paying to do some design work...go figure.

Anyway here's a pic of the board I designed for them.























I did almost all of the schematic design as well as all of the PCB layout. It has all sorts of goodies like Ethernet, GSM, microSD, Arduino Mega compatible etc etc. There are two boards that connect with a ribbon cable and are housed in a DIN rail enclosure. It's to be used for the specialised industrial control the company does.

So I haven't been idle.

There are not more boards to design and of course get working, so who knows when I'll get back to ArdweeNET full time, but I'll hopefully be able to do some work on getting the ArdweeNODE boards up and running.

Monday, 21 January 2013

It's quiet around here

Not much going on at present. Apart from all the Christmas faff I've got a paying gig to design some hardware and that takes precedence.

We do have all the components and PCBs to make a few prototypes so that should start happening before too long.

Meanwhile I just noticed that the ArdweeNET site is broken, the menu isn't working (it's not even appearing). I'll get onto that one day. (Fixed thanks to Stripey Lynx, AKA Paul)

Friday, 14 December 2012

Nasty nasty ATOMIC

Ooo what an insidious bug, let me set the scene.

I have a function that wants to transmit data on serial line, it's actually the start of a simple packetising feature I'm writing. So far there are two functions used as follows.

packetLoad(p, "ABCDEFGHIJKLMNOPQRSTUVWXYZ");
packetSend(p);


The names are fairly self-explanatory I think except that packetSend() wraps the string in "<>" characters, so the end result is this

"<ABCDEFGHIJKLMNOPQRSTUVWXYZ>"

being transmitted.

It worked just fine straight out of the box and lower bit rates, but then I upped the bit rate to 500kbps and it failed miserably, just transmitting "<A" or "<AB" or maybe  "<ABC" depending on where I placed debugging code.

The LARD serial code works like most others I guess, some code puts bytes into a buffer and some other code takes them out and puts them into the UART with the "taking out" part being interrupt driven. Both these processes involve updating a counter to indicate the number of bytes in the buffer, very important information that tells both sides how many bytes there are and where they should go.

For this reason any accesses to the buffer should be "atomic", meaning that when process A is accessing the buffer process B (or anybody else for that matter) must not also access the buffer. An atomic operation is indivisible, there must be no interrupts during the course of that operation.

In an interrupt-driven system this can be done by using semaphores, mutexes etc but in a simple system like this the easiest way is to just disable the interrupts when putting bytes into the buffer so the taking out code (which being interrupt-driven can occur at any time, even half-way through updating a counter) cannot try to take a byte out while the putting in code is putting one in. Got it?

So to facilitate this I have two nice macros called ATOMIC_START and ATOMIC_END, and to allow them to be nested they maintain their own counter and ATOMIC_END will only disable interrupts if that counter is 0.

So far so good, but the symptoms I mentioned above really smacked of a race condition on the buffer's nItems counter, the transmit ISR was reading 0 items when in fact there was 1. It would therefore take no further action and even though the buffer continued to fill no more bytes would be transmitted.

Time for the logic analyser and some deep thought. It's very hard to fix a problem if you can't see it and that's where a logic analyser comes into its own because this sort of problem cannot be seen with a debugger or "printf" debugging. I find the best way is to use 2-3 spare IO pins and toggle them at critical parts of the code, this has almost no affect on the real-time nature of the program and with the pulses properly placed they can tell you a lot.

So here's the obligatory logic analyser trace pic.



The packetSend()function has 28 bytes to send so it starts the ball rolling by writing the first of them directly into the UART ('A'), after that it writes the bytes into the buffer ('B').

The first write then causes a byte to be transmitted and when that's complete an interrupt ('C') is triggered. This checks to see if there are any bytes in the buffer and if so reads one and writes it to the UART ('D'). This process continues until there are no bytes left in the buffer and as you can see the As, Bs, Cs and Ds are nicely interspersed and everything works well for 4 bytes.

Now look at what happens at around the 200uS mark, the B that has been taking about 12uS blows out to nearly 38uS and smack in the middle of it we see a ISR call (with a negative pulse I used to see which path the code took).

This is the crux of the bug. If ATOMIC_START worked properly it should not be possible to service an interrupt in the middle of packetSend(). This means that potentially both functions are trying to access the buffer's byte counter at the same time and the results are indeterminate.

In this case the ISR obviously reads the counter just before it was incremented from 0 to 1, it therefore got a value of 0 and that folks was the end of any transmission, despite the fact that packetSend() continued to write bytes into the buffer.

I replace ATOMIC_START with the standard _disable_irq() and the whole shebang bursts into life with all 28 bytes being transmitted correctly.

So, another bug squashed, tomorrow I'll be having a long hard look at ATOMIC_START but for now it's 3AM so I'm off to bed.



PCBs arrive

The ArdweeNODE PCBs arrived in the US today, from what I can see in the photos they look pretty good.

From here my US mate will ship half of them and the parts to build 2 to me. Then we'll both start loading components and debugging.

I actually hate this part but it has to be done.

I just hope there aren't any major stuff ups, there's bound to be a track or two wrong or we may decide on a change, but a bad error can be a show stopper,

Wednesday, 12 December 2012

Beware cut & paste

I've just spend nearly half a day tracking down a hard fault error on my board, and all because of cut and paste and bad programming practice.

I have four interrupt handlers, one for each of the timers on the LPC1227, these ISRs have to gain access to appropriate structures in memory that are dynamically allocated and so can't be hard coded. To deal with this I have a static array of pointers to the timer structures that is filled in when the user calls the hwimerCreate() function.

In my LARD framework the timers are known as timers 0-3 and enumerated as such.

typedef enum {
    HWTIMER_0,
    HWTIMER_1,
    HWTIMER_2,
    HWTIMER_3
} hwTimerTypes;


And there's an array of pointers to timer structures, one for each hardware timer.

hwTimer * hwTimers[N_HWTIMERS] = {0};

Now an ISR knows of course what hardware timer it was invoked by, but it needs to find the software structure that holds other information, such as a pointer to a user-supplied function to call. So it indexes into the hwTimers array with hwTimers[i]where i is the timer's logical number.

The old ISRs looked like this (much code removed for clarity)

void TIMER16_0_IRQHandler(void) {     // 16-bit Timer0
    hwTimer * t = hwTimers[HWTIMER_0];
}

void TIMER16_1_IRQHandler(void) {     // 16-bit Timer1
    hwTimer * t = hwTimers[HWTIMER_1];
}

void TIMER32_1_IRQHandler (void) {    // 32-bit Timer0
    hwTimer * t = hwTimers[HWTIMER_3];
}

void TIMER32_0_IRQHandler(void) {     // 32-bit Timer1
    hwTimer * t = hwTimers[HWTIMER_2];
}


Note that the two 16-bit timers are first and the 32-bitters follow, and that we have the order 16_0, 16_1, 32_1, 32_0 and index into the array of pointers using HWTIMER_0, 1, 3, 2 in that order. A little odd but it works and I never rearranged the code to be more logical. Note also that the comments are wrong and designed to confuse any future programmer.

Apart from the comments so far so good, but I changed the code in each ISR, and to save retyping I got one working then cut the text and pasted into the body of other three which meant they all used HWTIMER_0 as their index and // 16-bit Timer0 as the comment. So I then went down the page changing 0, 0, 0, 0, to the logical order of 0, 1, 2, 3 for the HWTIMER_x index and fixed the comments.

void TIMER16_0_IRQHandler(void) {      // 16-bit Timer0
    hwTimer * t = hwTimers[HWTIMER_0];
}

void TIMER16_1_IRQHandler(void) {      // 16-bit Timer1
    hwTimer * t = hwTimers[HWTIMER_1];
}

void TIMER32_1_IRQHandler (void) {     // 32-bit Timer0
    hwTimer * t = hwTimers[HWTIMER_2];
}

void TIMER32_0_IRQHandler(void) {      // 32-bit Timer1
    hwTimer * t = hwTimers[HWTIMER_3];
}


Then I guess I had dinner, watched some TV, whatever and got back to my programming to find that the two 16-bit timers work just fine but the 32-bit timers cause a hard fault.

Yikes.

Hours later, after looking at the index values and the comments a 1000 times and telling myself that they are in the logical order I finally look at the function names.


TIMER32_1_IRQHandler and TIMER32_0_IRQHandler are swapped, they were before as well but in that case so where the HWTIMER_x indexes so although it wasn't best practice because they were out of order it did work, this time I've been nice and logical in editing the indexes and comments to be sequential and forgotten that the functions are not sequential.

A quick swap of TIMER32_1_IRQHandler and TIMER32_0_IRQHandler and all things work.

So the moral of the story, be very careful with duplicating code with cut & paste, and organize like function that differ only in a number in a logical and numerical order.

Now I've forgotten what I was working on...that's right, I was trying to generate a 100uS break condition on the serial line.

Tuesday, 11 December 2012

NXP WTF?

Well in another WTF moment I've been tackling the problem of detecting when a UART has transmitted ALL of the bytes you sent.

Trivial right? Well not as trivial as you may think.

Why do you care? Well maybe you have to turn around an RS-485 transceiver and you do that after the last byte, not the second-last. Or maybe you are sending a command to another processor and timing the response, it's a bit unfair to start timing before the last byte has gone.

The LPC UART (or at least the one on the 1227) has no explicit flag to read to tell you that the last byte has left the TSR (Transmit Shift Register). Actually that's not true, there is the TEMT flag.
Transmitter Empty. TEMT is set when both THR and TSR are empty;
Yep, that's clear enough. But there are two issues here, one is that you don't get an interrupt so you have to poll the TEMT flag. Usable but not good. The second issue is worse though, IT DOESN'T WORK.

You can poll the TEMT bit until the cows come home but it gets set when the FIFO is empty, not when the TSR is. (YMMV but that's what I'm seeing)

So just use a timer. Well that was the non-answer provided by an NXP support person on the forum. Use an entire hardware timer for this simple function? I think not, heck you only have 4 and he wants me to tie up two of them to fix their crap design.

Back to square one. So what do you get.

You get a THRE flag and interrupt, but this only tells you that the FIFO is empty, at this point however there is still a single byte in the TSR and that may not be gone for quite some time as is shown in this trace



Here we see two bytes being sent from the UART, 'A' and 'B'. The small pulse is the time at which the THRE interrupt is fired. Note that at this point 'B' has still not been transmitted.

Fortunately there is a mechanism that is clearly and succinctly described in the data sheet.
The UARTn THRE interrupt (UnIIR[3:1] = 001) is a third level interrupt and is activated when the UARTn THR FIFO is empty provided certain initialization conditions have been met. These initialization conditions are intended to give the UARTn THR FIFO a chance to fill up with data to eliminate many THRE interrupts from occurring at system start-up. The initialization conditions implement a one character delay minus the stop bit whenever THRE = 1 and there have not been at least two characters in the UnTHR at one time since the last THRE = 1 event. This delay is provided to give the CPU time to write data to UnTHR without a THRE interrupt to decode and service. A THRE interrupt is set immediately if the UARTn THR FIFO has held two or more characters at one time and currently, the UnTHR is empty. The THRE interrupt is reset when a UnTHR write occurs or a read of the UnIIR occurs and the THRE is the highest interrupt (UnIIR[3:1] = 001). 

Got that? No, I didn't either despite reading it maybe 10 times.

Luckily one of the guys on the LPC forum is smarter than me and he explained it,
So you have only write one byte in the fifo and the THRE interrupt will occur after this byte was sent.  
Still a bit unclear so let me slightly reword it.
If you only place a single byte in the FIFO the THRE interrupt will occur after this byte was sent. 
That's right and worth repeating and rephrasing again in the hope that one of the explainations will make sense, if you only place one byte in the FIFO you get the THRE interrupt after that byte has gone. Yay, that's exactly what we need, and here it is in action


Note that I have only sent a single byte and that the interrupt pulse now occurs after that byte has completely left the TSR (not counting the stop bit).

We're getting there, trouble is you normally send more than one byte. What happens if we send 10? Well in that case unless you take extra steps you are back where you started. If you just blat 10 bytes into the FIFO the interrupt will fire after the 9th byte has been transmitted, not after the 10th.

You have to get clever and hold off with the last byte. You send 9 bytes straight away and when the last of those is in the TSR you write the 10th byte into the FIFO.


Here we have written 9 bytes ('A' thru 'I') into the FIFO, when 'I' goes into the TSR the interrupt is fired. At this point the FIFO is empty and we write the 10th byte ('J') into it, thus satisfying the "only one byte in the FIFO" criteria.

The next time we see the THRE interrupt is after the 'J' has gone. We can now set a global flag somewhere to tell the rest of the program that the data has been completely transmitted.

And here is the pseudo code for the interrupt function (actually this is my real code with a lot of unrelated stuff deleted for clarity)


Note the hwFifoCount variable, this is my workaround for the FIFOLVL bug in the hardware as documented over the last couple of days, it keeps track of the number of bytes in the FIFO.

Phew, what a marathon, it probably took longer to document than to do :)



Monday, 10 December 2012

Well waddaya know?

You know that problem I had yesterday with the FIFOLVL register returning 0 no matter how many bytes there are in the FIFO?

Well it seems it's actually a bug in the chip. And there I was starting to doubt my brilliance, it shook my confidence to the core I don't mind telling you.

So now all is right with the world, NXP stuffed up not me, and my workaround will stand as the way to do this until further notice.