[2024-feb-29] Sad news: Eric Layton aka Nocturnal Slacker aka vtel57 passed away on Feb 26th, shortly after hospitalization. He was one of our Wiki's most prominent admins. He will be missed.

Welcome to the Slackware Documentation Project

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
howtos:hardware:hardware_diagnostics [2013/10/16 23:43 (UTC)] – Corrected prior review note to make it hidden mfillpothowtos:hardware:hardware_diagnostics [2014/07/31 22:43 (UTC)] (current) – [Solutions] Modified mouse section metaschima
Line 34: Line 34:
   * The system may not boot after pressing the power button, and you may need to press it more than once.   * The system may not boot after pressing the power button, and you may need to press it more than once.
   * Rarely it dies completely, like after a power surge, and the fans will not be running and it will not POST or boot and there may be motherboard damage.   * Rarely it dies completely, like after a power surge, and the fans will not be running and it will not POST or boot and there may be motherboard damage.
 +=== Power Button ===
 +  * System powers on without wake-on-LAN or wake-on-timer or other such wake methods.
 +  * Reboot loop where the system shuts down and powers up on its own continuously.
 +  * The system may not boot after pressing the power button, and you may need to press it more than once.
 +  * You may get a "boot failure" message during system POST.
 +  * You may hear beep codes.
 === CMOS battery === === CMOS battery ===
   * The time is reset on each boot.   * The time is reset on each boot.
   * The BIOS settings reset on boot.   * The BIOS settings reset on boot.
   * There may be beep codes.   * There may be beep codes.
 +  * The screen may be black without any beep codes.
 === HDD === === HDD ===
   * I/O errors in the logs.   * I/O errors in the logs.
Line 47: Line 54:
 === GPU === === GPU ===
   * Graphical glitches on screen.   * Graphical glitches on screen.
-  * If it is really dead the screen will be black.+  * The screen may be black.
   * May cause system hangs, and audio may be looping during the hang.   * May cause system hangs, and audio may be looping during the hang.
   * You may notice a high-pitched (squealing/screeching) noise coming from it, due to failing capacitors.   * You may notice a high-pitched (squealing/screeching) noise coming from it, due to failing capacitors.
Line 53: Line 60:
 === CPU === === CPU ===
   * May or may not POST.   * May or may not POST.
-  * Kernel panics are possible with multi-core machines, when only one core is affected. 
   * There may be beep codes.   * There may be beep codes.
   * The fans are typically running at 100%.   * The fans are typically running at 100%.
-  * If it is overheating it may trigger a MCE (Machine-check exception) which will cause a kernel panic and forced shutdown, or it may throttle itself down and the system will seem slower.+  * Kernel panics are seen with multi-core machines, when only one core is affected. These are clearly listed as ''Hardware Error'' MCE (Machine-check exception). 
 +  * If it is overheating it may also trigger a MCE which will cause a kernel panic and forced shutdown, or it may throttle itself down and the system will seem slower.
 === Motherboard === === Motherboard ===
   * Check for swollen capacitors.   * Check for swollen capacitors.
Line 66: Line 73:
   * May keep spinning up and down forever while trying to read a disk.   * May keep spinning up and down forever while trying to read a disk.
   * May not open when you press the button, but instead make some clunking sounds and open only after many button presses.   * May not open when you press the button, but instead make some clunking sounds and open only after many button presses.
 +=== Mouse ===
 +  * Problems with click-and-drag, drag-and-drop, drag-to-select, double-click.
 +  * Mouse pointer jumps.
 +  * Acceleration issues.
 <note important>Being able to pinpoint the source of the high-pitched squealing/screeching noise caused by failing capacitors may be important in finding the bad hardware component, and is almost always a sign of hardware issues.</note> <note important>Being able to pinpoint the source of the high-pitched squealing/screeching noise caused by failing capacitors may be important in finding the bad hardware component, and is almost always a sign of hardware issues.</note>
 ==== Diagnostics ==== ==== Diagnostics ====
Line 85: Line 96:
 You should **ALWAYS** use a surge protector on **ALL** electronic devices **ALL** the time. This prevents damage to the PSU, to the motherboard, and it saves you lots of time and money wasted on replacing electronics damaged by power surges. Many surge protectors come with a warranty that will refund a certain amount of money if your equipment is damaged while using the surge protector properly. The surge protector is cheap, the equipment is expensive, and the refund usually more than covers it. You should **ALWAYS** use a surge protector on **ALL** electronic devices **ALL** the time. This prevents damage to the PSU, to the motherboard, and it saves you lots of time and money wasted on replacing electronics damaged by power surges. Many surge protectors come with a warranty that will refund a certain amount of money if your equipment is damaged while using the surge protector properly. The surge protector is cheap, the equipment is expensive, and the refund usually more than covers it.
 </note> </note>
 +=== Power Button ===
 +  * A bad power button is surprisingly difficult to diagnose, and there are no software tests that can help. Looking for characteristic symptoms such as a reboot loop and swapping out hardware may be the only way to diagnose it.
 === CMOS battery === === CMOS battery ===
   * You could take the battery out, making sure to use the special tab to remove it rather than trying to pry it off, and measure its voltage. Or you could just throw it away in the proper battery disposal container and replace it anyway just to be sure, and also because you may have had to remove a graphics card or PCI/PCIE card to reach the battery and you may not have a battery tester for these types of batteries.   * You could take the battery out, making sure to use the special tab to remove it rather than trying to pry it off, and measure its voltage. Or you could just throw it away in the proper battery disposal container and replace it anyway just to be sure, and also because you may have had to remove a graphics card or PCI/PCIE card to reach the battery and you may not have a battery tester for these types of batteries.
Line 170: Line 183:
 </code> </code>
 If it does not say that, then the DVD/CD was not burned properly. However, it may also be because of bad media or high burn speed. If the drive keeps spinning up and down while reading a disk it could be failing or it could be that you are trying to play a commercial DVD whose region code is not supported by the drive, which limits it to 1x read speed. If it does not say that, then the DVD/CD was not burned properly. However, it may also be because of bad media or high burn speed. If the drive keeps spinning up and down while reading a disk it could be failing or it could be that you are trying to play a commercial DVD whose region code is not supported by the drive, which limits it to 1x read speed.
 +=== Mouse ===
 +Trying to eliminate driver and software issues would be a first step to diagnosing mouse problems. Try restarting Xorg, reloading mouse drivers, and restarting the system. If these have no significant effect, then it is likely a hardware issue.
 ==== Solutions ==== ==== Solutions ====
 <note important>An overheating system can cause instability and may mimic failing hardware. Before doing anything else it may be worthwhile to get a compressed air canister and clean the dust out from all the fans, heat sinks, and all the hard to reach places where dust accumulates inside the case. Using a brush is not as effective and may damage components, so it should only be used outside the case. It is also important to make sure there is proper airflow inside the case: <note important>An overheating system can cause instability and may mimic failing hardware. Before doing anything else it may be worthwhile to get a compressed air canister and clean the dust out from all the fans, heat sinks, and all the hard to reach places where dust accumulates inside the case. Using a brush is not as effective and may damage components, so it should only be used outside the case. It is also important to make sure there is proper airflow inside the case:
Line 187: Line 202:
 === PSU === === PSU ===
 Replace the PSU with a new PSU, if the symptoms disappear, you can be sure it was the PSU. However, note that a bad PSU may damage the motherboard or other components, or maybe it was a power surge that damaged everything. Replace the PSU with a new PSU, if the symptoms disappear, you can be sure it was the PSU. However, note that a bad PSU may damage the motherboard or other components, or maybe it was a power surge that damaged everything.
 +=== Power Button ===
 +Try to clean the power button using compressed air. If that doesn't work then replace the power button if possible, otherwise replace the case along with the button or take it to a repair shop and see what they can do.
 === CMOS battery === === CMOS battery ===
 Replace the battery carefully using the special tab. Do **NOT** pry it off using a screwdriver because then it will break and it won't go back in. Replace the battery carefully using the special tab. Do **NOT** pry it off using a screwdriver because then it will break and it won't go back in.
 +<note important>
 +Replacing the CMOS battery will reset your BIOS/UEFI settings, so be prepared for this.
 +</note>
 === HDD === === HDD ===
 You should first get your important data off of it. If you feel it is failing fast, use ddrescue to image the drive to another drive. Once you have the image, be it complete or incomplete, you can use [[http://www.cgsecurity.org/wiki/TestDisk|Testdisk]] and/or [[http://foremost.sourceforge.net/|foremost]] to carve data off the image. The data will not have the same file name it used to, but at least you will get the data. You can find all these utilites and more on the [[http://www.sysresccd.org/SystemRescueCd_Homepage|SystemRescueCD]]. Now just replace the drive. You should first get your important data off of it. If you feel it is failing fast, use ddrescue to image the drive to another drive. Once you have the image, be it complete or incomplete, you can use [[http://www.cgsecurity.org/wiki/TestDisk|Testdisk]] and/or [[http://foremost.sourceforge.net/|foremost]] to carve data off the image. The data will not have the same file name it used to, but at least you will get the data. You can find all these utilites and more on the [[http://www.sysresccd.org/SystemRescueCd_Homepage|SystemRescueCD]]. Now just replace the drive.
Line 198: Line 218:
 === CPU === === CPU ===
 If you suspect the CPU is overheating, then you can try to remove the heatsink-fan block according to your CPU or heatsink manual, [[http://www.howtocleananything.com/home-interior-tips/computer/how-to-clean-thermal-paste-off-cpu/|clean off the old thermal paste]], [[http://www.maximumpc.com/article/howtos/howto_install_cpu_and_apply_thermal_paste|apply new thermal paste]], or let an expert do it. Otherwise, replace the CPU. Make sure it is actually the CPU and not the RAM that is the problem, as a CPU is very expensive compared to RAM. If you suspect the CPU is overheating, then you can try to remove the heatsink-fan block according to your CPU or heatsink manual, [[http://www.howtocleananything.com/home-interior-tips/computer/how-to-clean-thermal-paste-off-cpu/|clean off the old thermal paste]], [[http://www.maximumpc.com/article/howtos/howto_install_cpu_and_apply_thermal_paste|apply new thermal paste]], or let an expert do it. Otherwise, replace the CPU. Make sure it is actually the CPU and not the RAM that is the problem, as a CPU is very expensive compared to RAM.
 +<note>
 +If you cannot replace the CPU and it is a multi-core CPU with only one core that has failed, [[http://www.linuxquestions.org/questions/showthread.php?p=5162136|for example the L1 cache of CPU core #2]], then you can try to disable the failed core. One way to do this is to use the ''maxcpus='' kernel boot option. In the example you would use:
 +<code>
 +maxcpus=2
 +</code>
 +this is because CPU cores are numbered starting with 0, so 4 cores would be numbered 0, 1, 2, 3. Note that if CPU #0 is failing, you may be out of luck. Another complementary way is to first make sure you have ''CONFIG_HOTPLUG_CPU'' built-in to the kernel, and running:
 +<code>
 +echo 1 > /sys/devices/system/cpu/cpu3/online
 +</code>
 +on every boot. In this example this turns ON CPU #3. So, basically you would have the 3 good cores running i.e. 0, 1, 3.
 +</note>
 === Motherboard === === Motherboard ===
 Replace the motherboard, and make sure the PSU is not damaged, as it can damage your new motherboard. Replace the motherboard, and make sure the PSU is not damaged, as it can damage your new motherboard.
 === DVD+-RW === === DVD+-RW ===
 Replace the drive. Replace the drive.
 +=== Mouse ===
 +Replacing the mouse is the definitive solution. However, some fixes may help temporarily. 
 +  * For click-related problems, opening the mouse and investigating the cause of the problem is the first step. On most mice, there are 4 screws that hold the upper and lower mouse parts together and are located underneath stick-on slippery pads. Note that after removing the stick-on pads they may be difficult or impossible to replace, so only open the mouse if the problem is truly bothersome or if you have extra stick-on pads provided with some mice. 
 +    * It may be that plastic parts inside the mouse have worn down with repetitive use. A possible fix for this is to first identify areas that have been worn down and then carefully add plastic in the form of plastic cement or superglue to replace the lost plastic. This is best done with a toothpick and must be very carefully controlled. Remove excess glue immediately and allow the rest to fully dry overnight before using the mouse. Do NOT apply glue to any area that might interfere with mouse function. This is a last resort solution, so do NOT try it until you are ready to throw the mouse out. Duct tape may be used instead for a temporary or trial solution.
 +    * It may also be that the plastic button arch has fatigued over time and doesn't provide enough resistance to prevent inadvertent activation of the switch. A possible fix for this is adding a plastic piece under the plastic button arch to improve its resistance. Depending on the mouse, either gluing a piece of old cut credit card under the plastic button arch or inserting the credit card piece with a piece of foam sandwiched underneath into the pocket under the plastic button arch may work.
 +  * For wandering / jumping pointer problems, cleaning the optical / laser window on the underside of the mouse with alcohol and a q-tip may help. If it is a wireless mouse consider infrared or microwave interference depending on the wireless mouse type. Removing or isolating sources of interference or distancing oneself from the sources may help. Wireless mouse manuals recommend placing the base station away from electronic devices or other sources of electromagnetic interference.
 +  * For acceleration issues one can try [[https://wiki.archlinux.org/index.php/Mouse_acceleration|adjusting acceleration settings]]. However, in some cases the acceleration issues are intermittent, and may be driver issues that are much harder to solve. Trying different driver and software versions may help, or one can try to contact the developers of the drivers / software.
 ====== Sources ====== ====== Sources ======
 <!-- If you are copying information from another source, then specify that source --> <!-- If you are copying information from another source, then specify that source -->
Line 209: Line 247:
 <!-- * Contributions by [[wiki:user:yyy | User Y]] --> <!-- * Contributions by [[wiki:user:yyy | User Y]] -->
   * Written by [[wiki:user:htexmexh|H_TeXMeX_H]]   * Written by [[wiki:user:htexmexh|H_TeXMeX_H]]
-  * Contributions from: [[wiki:user:tobisgd|tobisgd]], onebuck+  * Contributions from: [[wiki:user:tobisgd|tobisgd]], onebuck, [[wiki:user:metaschima]]
   * I cite the man page of smartctl.   * I cite the man page of smartctl.
 <!-- Please do not modify anything below, except adding new tags.--> <!-- Please do not modify anything below, except adding new tags.-->
 <!-- You must remove the tag-word "template" below before saving your new page --> <!-- You must remove the tag-word "template" below before saving your new page -->
 {{tag>howtos hardware software author_htexmexh}} {{tag>howtos hardware software author_htexmexh}}
 howtos:hardware:hardware_diagnostics ()