Forums

Resolved
0 votes
Hi,

I noticed a drop in write speeds on all my ClearOS installations! (about to a 1/4!)
First was my home installation:
(not exact numbers, only averages rounded down)
Single SSD: r/w 400/400 MB/s -> 400/100 MB/s
I thought the SSD was nearing her EOL, but smartctl tells me everything is ok. The CPU
was only on 3% during the tests. So that seems fine to.
I started testing my other installations:
RAID5 with 4SSD: r/w 900/900 -> 900/250 MB/s
RAID5 with 4SSD: r/w 850/850 -> 850/200 MB/s
RAID1 with 2SSD: r/w 500/500 -> 500/100 MB/s

- All different Brands off SSDs, different MB, different Raid controllers
- I don't know when exactly this started. These are only file-servers, so as long as the
writing speed is greater than the network speed, I won't notice ;)
- The benchmarks where made with fio
- I checked other installations with CentOS 7, Fedora 32: everything seems fine

Any idea what is going on?
Friday, October 02 2020, 09:55 AM
Share this post:
Responses (9)
  • Accepted Answer

    Wednesday, October 07 2020, 02:39 AM - #Permalink
    Resolved
    0 votes
    Gabriel, Nothing to be sorry about - you had a problem, posted and solved it - Well done.

    The only waste is the forum search facility being so crappy that future posters with similar probems will likely not find this append and the help it might provide them...
    The reply is currently minimized Show
  • Accepted Answer

    Tuesday, October 06 2020, 02:42 PM - #Permalink
    Resolved
    0 votes
    Sorry! Sorry and sorry again for wasting your time :(

    I found it (at least I think so).
    The LSI controller had the wrong settings! I don't remember when I did change them.
    Somewhere in the past I read an article which mentioned the correct settings
    for an SSD Raid:
    Read Policy: No read ahead
    Write Policy: Write Through
    Obviously I changed that and never tested it :(
    I did setting it now back to:
    Read Policy: Always Read Ahead
    Write Policy: Always Write Back (the server is connected to a UPS)
    And now on System 1: r/w 897/801 MB/s :)))
    The others systems will have the same problem, I think. And my home server probably needs
    just a new SSD.

    Again sorry. And thanks for your time and help full links.
    The reply is currently minimized Show
  • Accepted Answer

    Tuesday, October 06 2020, 02:07 PM - #Permalink
    Resolved
    0 votes
    Gabriel - way past my bedtime - but one thing for you to research re. your RAID.
    With multiple layers and parity RAID you must get the alignment, stripe size, block size correct - otherwise crossing boundaries cause split (extra) writes etc
    This sort of thing
    The reply is currently minimized Show
  • Accepted Answer

    Tuesday, October 06 2020, 12:41 PM - #Permalink
    Resolved
    0 votes
    So. I found a spare Samsung 850EVO and did a fresh ClearOS Install.
    I used LVM for the test.
    r/w 478/455 MB/s
    LVM is in the clear :|

    Raid:
    I am aware that a Raid 5 will have lower write performance. But as you showed
    me, it should not be 400% but more like 15% with a 1GB file.
    (r/w 234/205 using your fio script)
    Anyway, let's asume my home-server SDD is just bad and Raid 5 has a 400% lower write
    speed: what is with the Raid 1?
    System 3:
    2xSSD: r/w 530/105 MB/s
    The system has also a Raid 1 with 2x8TB disks:
    2xHDD: r/w 187/183 MB/s

    There must be something wrong with my installations. I just can't see what :(
    The reply is currently minimized Show
  • Accepted Answer

    Tuesday, October 06 2020, 09:45 AM - #Permalink
    Resolved
    0 votes
    Thanks a lot for the input :) I will dive in to it.

    And yes, I only test the filesystem. On a file server I only care for the underlying hardware layers, when
    performance is not where I expect it to be ;) As it is the case here :) But I don't think the Raid does play
    a role. I see the problem on SingleDisk, Raid 1, 2xRaid 5....

    AND:
    I found another System I could test:
    System 4: ClearOS 7 server
    Shuttle Barebone XPC slim DH370, Intel Core i7-9700, Samsung SSD 970 EVO Plus NVMe
    /dev/nvme0n1p1 112G 4.8G 107G 5% /
    Single Disk, No LVM, Last measure: r/w 2153/2308 MB/s
    Build 202003

    This seems to run fine. Also your ClearOS 7 Server is OK! Both systems don't have LVM!!
    Maybe LVM is really the culprit?
    I will try and reinstall my home server without LVM (don't need it there anyway) and see what
    happens. But this will have to wait a least a week.

    Remark:
    I testet System 1 with filesize 500MB, 1GB, 5GB and 10GB
    1051/232, 929/253, 845/240, 833/249
    doesn't seem to impact the write speed....
    The reply is currently minimized Show
  • Accepted Answer

    Tuesday, October 06 2020, 04:44 AM - #Permalink
    Resolved
    0 votes
    Forgot to mention this...

    Discussing your claim of a drop in 4x and citing a RAID 5 e.g. "RAID5 with 4SSD: r/w 900/900 -> 900/, 250 MB/s" in your initial post. Thus on your RAID 5 the write speed would be about the read speed, 250 x 4 in your case. With parity RAID, e.g. RAID 5, we need to verify and re-write parity with every write that goes to disk. This means that a RAID 5 array will have to read the data, read the parity, write the data, and finally write the parity. Four operations for each effective one. This gives us a write penalty on RAID 5 of four, a quarter. This is at the disk hardware level with simple I/O. Thus to attain about the same write as read speed, we need to optimize writes using cache, aggregating writes, queue depth, force flush to disk time limit, whether asynchronous or synchronous writes, whole block writes etc techniques.

    Don't have an SSD RAID here, but this should not matter for the purpose of this discussion... Using a 3 disk RAID 5 with WD "RED" drives (none SMR) we get.
    r/w 234/205 using your fio script
    r/w 189/38 increasing file-size 10x

    Here we increased the file-size so all the software enhancements to limit the effect of writes on parity RAID have run out of resources. This shows the importance of these software enhancements and thus also the focus of where your performance may have changed such as a parameter change. Reads have also suffered, but by a much smaller percentage. There is much of interest on the web here, for example Understanding RAID
    Repeated the above test on an SSD, not RAID. Not such a big discrepancy
    r/w 523/504 per your script
    r/w 527/348 using fio large file

    One of the reason so many stats are run here is to pick up when something goes awry. Able to notice a chance very quickly which makes problem solving so much easier and been very useful in the past. E.g. Main Server Stats
    The reply is currently minimized Show
  • Accepted Answer

    Tuesday, October 06 2020, 03:12 AM - #Permalink
    Resolved
    0 votes
    OK - Checking the only two systems here comparable to those you listed, but a bit older :) hardware...

    Main ClearOS 7 Server
    Gigabyte GA-Z77MX-D3H, Intel i7 3770 Samsung SSD 850 EVO 250GB
    /dev/sda3 12G 6.0G 6.1G 50% /
    OS Disk, r/w 521/505 MB/s
    OS installed Aug 2018

    Fedora 32 Workstation
    Gigabyte GA-Z77N-WiFi, Intel i3 3220, Crucial MX500 500GB
    /dev/sda7 78G 52G 27G 67% /
    OS Disk, r/w 524/470 MB/s
    OS Installed Jul 2019 - rpm upgrade to 32 Jun 2020

    Don't run Xen or LVM - which is interesting looking at your list of systems. All of your tests, including dd - are filesystem tests... So looking at your System 1 for example, we seem to have these hardware devices and software layers (not necessarily in this order) cpu, disk drive, PCI bus, disk controller, driver, Xen, LVM, Raid, fs (xfs, ext4 etc). A problem with any one, or more, of these layers could cause a slow-down. Looking at your "Home" system, it appears that there is no Xen - just LVM? and the write performace is very poor. One factor here is the cloice of SSD. WD Green SSD has no RAM cache - it dynamically allocates some of the TLC NAND to be in SLC mode, for a cache. Once that cache is exhausted with large enough writes at any one time, they then go straight to TLC with a corresponding plummet in write speeds. No idea if your fio test is hitting this cliff somewhere during your fio test. That said, there are two here - but on Raspberry Pis and thus unlikely to ever be a bottle neck.

    Since ClearOS 7 if based on CentOS 7 any slowdown probably results from a CentOS component if caused by the OS. I would be surprised it ClearOS would be changing any disk access/fs component software. One has to wonder if LVM is the culprit as it seems to be the common item, but since avoiding here any fs complexity to make trouble shooting easier, cannot help or test LVM. You might be interested in the conclusions drawn here LVM performance examination Your yum install history should show if there have been any LVM updates, and when. A percursory google search didn''t find any CentOS based LVM performance complaints.

    Raid 5 requires a large numbers of disk write operations to accommodate the parity information. You might want to investigate whether these drives are bing "trimmed" as they are behind both LVM and RAID (are you using the LSI hardware raid - or sofware mdadm), otherwise you are relying on normal SSD inbuilt housekeeping. This would only be a concern if you have a large number of constant writes that exhausts the supply of free NAND blocks. When this happens, write speeds plummet. Again you might find this interesting, if a bit dated. TRIM Support in SSD RAIDs

    You might also want to reseach tuning linux disk write performance. There are many 'knobs' here and a change in one or more can greatly affect your results. Google has quite a lot to help here. Simply use nothing more than the tuned-adm throughput-performance profile here.
    The reply is currently minimized Show
  • Accepted Answer

    Monday, October 05 2020, 02:03 PM - #Permalink
    Resolved
    0 votes
    Gladly I share the Information:

    Home:
    Shuttle XH110V, Intel Pentium G4600, WesternDigital WDGreen 120GB M2
    /dev/mapper/clearos-root 104G 13G 91G 13% /
    Single Disk, Install directly on the Disk, Last measure: r/w 456/91 MB/s
    Hosting a Fedora 32 Server VM. I tested w/o the VM running -> same result
    Build: 2018

    System 1:
    Asus B150M-C, Intel Core i7 6700K, 4x Samsung SSD 850 PRO 512GB
    /dev/xvda1 83G 17G 62G 21% /
    Raid 5, The installation runs in Xenserver VM, Last measure: r/w 841/231 MB/s
    Raid Controller: LSI SAS 9260-4i
    Other VMs: 2x Windows 10
    Build: 2016

    System 2:
    Asus TUF B450M-PG, AMD Ryzen 5 2600G, 4x Samsung SSD 860 EVO 1TB
    /dev/xvda1 50G 40G 7.3G 85% /
    Raid 5, The installation runs in Xenserver VM, Last measure: r/w 797/256 MB/s
    Raid Controller: LSI SAS 9341-4i
    Other VMs: Windows 7, CentOS 7
    Build: 2019

    System 3:
    Asus TUF B450M-PG, AMD Ryzen 7 2700, 2x Samsung SSD 860 EVO 500GB
    /dev/xvda1 50G 5.0G 42G 11% /
    Raid 1, The installation runs in Xenserver VM, Last measure: r/w 530/105
    Raid Controller: LSI SAS 9341-4i
    Other VMs: Windows 10
    Build: 2018

    The only common thing I can see, is the ClearOS 7 installation. At one Point in the
    past they where running fine. As I said: I don't know when this behaviour started :(
    Different SSDs, Different MoBo, Different CPUs, Different Installations, Different Age...

    My fio batch:
    FILE=${1:-fiotest.tmp}
    fio --loops=5 --size=1000m --filename="${FILE}" \
    --stonewall --ioengine=libaio --direct=1 \
    --name=Seqread --bs=1m --rw=read \
    --name=Seqwrite --bs=1m --rw=write
    [ -f "${FILE}" ] && rm "${FILE}"

    But I also did a Test with dd on System 1:
    dd if=/dev/zero of=benchfile bs=2G count=1
    -> 2147479552 Bytes (2.1 GB) copied, 8.28738 s, 259 MB/s

    When I find the time, I install a fresh ClearOS 7 and repeat the tests....

    ----------------------------------------------------------------
    For comparison:
    Fedora 32 Workstation
    Asus B150M-C, Intel Core i7 6700K, Samsung SSD 850 PRO 512GB
    /dev/sda2 52G 36G 14G 74% /
    Single Disk, Last measure: r/w 498/457 MB/s
    Build: 2016

    CentOS 8 Workstation
    Asus mini-pc, Intel Celeron N4000, Kingston SA400S37 120GB
    /dev/sda2 93G 9.7G 84G 11% /
    Single Disk, Last measure: r/w 381/321 MB/s
    Build: 2019

    CentOS 7 WS
    Asus B150M-C, Intel Core i5 6600, Samsung SSD 850 EVO Pro 128GB
    /dev/sdb3 19G 13G 4.3G 76% /
    Single Disk, Last measure: r/w 489/485 MB/s
    Build: 2016
    The reply is currently minimized Show
  • Accepted Answer

    Saturday, October 03 2020, 07:20 AM - #Permalink
    Resolved
    0 votes
    SLC, MLC, TLC or QLC SSD?
    How full are they?
    Can you give us an idea of the hardware? SSD, motherboard, CPU, raid controller...
    Are you able to share the fio command line you are using?
    The reply is currently minimized Show
Your Reply