Already completely rebuilt

02 September 2022
Back in March I built a new personal workstation that I realistically expected to last upto a decade, but in reality it barely chalked up four months and part of me wonders whether any major component within it will see out the year. Of all the new bits of hardware that were part of the original build 36% of them by value are already gone, and the two memory SIMMs account for about half of what is left. Having the system out of comission twice already has also been disruptive and I have never had so much trouble with a single system.

System in bits

Yet another RMA'd CPU

It had been barely a month since I had the CPU replaced due to Linux throwing up errors that should not be appearing at all on a brand-new system, and with the replacement installed the system was already giving the same error. When the first CPU was sent back the hardware vendor took one look at the CPU and decided it was seriously faulty, to the point I was surprised it worked at all. They supposedly tested the CPU using Prime95 so first thing I did on getting the replacement was to run the Linux equivilent and everything came up fine, and when I saw the errors on the new CPU I ran the whole suite of “torture tests” overnight but this failed to trigger any errors. Nevertheless I decided to stick to my instincts that Linux only flags up errors like this when they are seriously bad and just send the replacement CPU back.

[Hardware Error]: Deferred error, no action required. [Hardware Error]: CPU:1 (19:21:2) MC9_STATUS[Over|-|MiscV|AddrV|PCC|SyndV|CECC|Deferred|-|Scrub]: 0xdf7bd305c74800fe [Hardware Error]: Error Addr: 0x0000000000000000 [Hardware Error]: IPID: 0x0000000000000000, Syndrome: 0x0000000000000000 [Hardware Error]: L3 Cache Ext. Error Code: 8 [Hardware Error]: cache level: L2, tx: RESV

This time round the vendor put the CPU through a weekend of the sort of testing I could only dream of doing myself and no fault could be found, and concluded the source of the errors was a different component. To be fair the vendor did remark that my situation of having multiple faulty components is a nasty coincidence and that I had returned the replacement CPU in good faith given my earlier experience. At least I now know that this CPU is as close to confirmed good as one can be and they sent it back to me free of charge. Since I had done a Memtest86+ check my suspicions were now the motherboard being the cause of the latest glitches, possibly due to something on the device DMA channels.

A rebuilt system

Because of the way that the Noctua NH-L9a-AM4 is installed with the bolts accessed from the underside of the motherboard, removing the CPU involves almost completely dismantling the system so its replacement is not far off a major rebuild. With two of the three major components being replaced — namely the motherboard and CPU, with the remaining component being the RAM chips — it is now practically an entirely new core system even though a lot of non-core components were recycled and it still uses the Slackware 15.0 software install.

Underside bolts

A clean (mother-)board

I was never really happy with the motherboard due to its lack of PCI-Express slots and having to put the graphics card in a slot other than the one it would ideally go into, and had already ordered a replacement even before I was informed of the CPU test results. Its micro-ATX form factor was chosen when I was still considering using the more compact computer case my previous personal workstation was in, but since I decided to use the tower case instead I regretted choosing the smaller form factor since it was very likley I would want to add in other cards over the system's expected life-time.

The 'old' motherboard

Aside from choosing a new one that was ATX I also looked for one that used passive cooling for the Northbridge rather than having a small fan that would likley get noisy over the years. I don't know where the ‘old’ motherboard's packaging went so I decided to write it off rather than RMA it. One irritaion with the new motherboard is it using LED headers that are incompatible with my case — it would be easy enough to create an adaptor but for now I have decided to leave the LEDs disconnected.

Incompatible connector

Bye-bye SoundBlaster

The original purpose of installing a Sound Blaster Audigy Fx was to avoid a repeat of problems I had with conference calls on my professional workstation but since then I discovered the Audigy Fx used a sound chip from the same range of Realtek chips typically used on motherboards. The ALC889 used on the motherboard in that workstation may have been a cheap component compared to the ALC898 on the Audigy; its integration onto the motherboard might have been the problem; or it could have simply been Creative having drivers of much higher quality. Whatever the cause of the problems with my work system given how much I have already assembled & disassembled this personal workstation I felt it appropriate to give on-board sound a second chance. The new motherboard has an ALC897 which might be very similar to the ALC898 the Audigy Fx uses.

A lot of audio issues I had with the previous workstation build disappeared and I suspect the Audigy Fx itself was at the least a contributor to some of the problems. Previously plugging/unplugging headphones from the front panel were not always being properly handled and there were problems with audio distortion when a USB device was plugged into it. I suspected the latter may be down to electrical balancing issues between the audio and USB since I have recollections of a similar problem with my professional workstation. Laggy playback that was previously resolved by restarting Pulseaudio whenever it occurred also went away. To be fair all this could have been down to a fault or incompatibility with the motherboard itself since there were distortion issues with the Audigy Rx as well.

An additional network interface

With extra PCI-Express slots available I was able to install a two-port network interface card I originally bought as part of the August 2019 upgrade of my previous personal workstation but never got round to its intended purpose of DPDK development. Working on the DPDK project became the foundation of my career and given the circumstances of having to rebuild my personal workstation it felt right that I at least have hardware setup ready for testing purposes. I had thought about installing it into my professional workstation but never got round to it, and in any case that is a step best avoided since any DPDK work would likley be personal projects.

System reconfigurations

Due to the motherboard change some Linux settings needed adjustment to get things working again, the most notable was there being no eth0 network interface and hence no network access due to it being linked to a now non-existant MAC address. Rectifying the latter involves changing the UDEV configuration file /etc/udev/rules.d/70-persistent-net.rules to reference the new MAC address:

# PCI device 0x10ec:0x8168 (r8169) SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="d8:5e:d3:31:c4:f1", ATTR{dev_id}=="0x0", ATTR{type}=="1", KERNEL=="eth*", NAME="eth0"

Conclusion

Having this workstation out of commission has been very disruptive since a lot of personal stuff had been off-loaded onto the system and hence was not able to access it, and to make matters worse it meant using my professional workstation for non-work things which was something I was desperate to avoid. Having the computer in bits also put both the desk and my electronics workbench out of use, so a lot of other things I wanted to get done were also blocked. This is exactly the sort of setback I can do without, the only good thing being that I was able to make additions that I had wanted to have in the first place.