IRC log of #maemo-ssu for Sunday, 2016-07-10

kerioand the ludicrous intel DC P3608 4TB gets 3GB/s of sequential writes00:00
kerio(PCIe 3.0 x8)00:00
DocScrutinizer05highly irrelevant00:01
kerioit's only 8959.99 USD on amazon.com00:01
DocScrutinizer05the point is that writing zeroes isn't a working replacement for TRIM00:03
kerioindeed00:03
DocScrutinizer05the controller needs to receive a hint that the block doesn't contain valid data00:04
kerioit should be "easy" to test if that MMC ERASE command works00:04
keriowrite random data over the whole thing00:04
keriothen write random data again, measuring the speed00:05
keriothen ERASE everything00:05
keriothen write random data again and measure the speed00:05
DocScrutinizer05exactly00:05
keriodo we have a debug utility for the eMMC?00:05
DocScrutinizer05also exactly what I had in mind, for swap00:05
DocScrutinizer05no00:05
DocScrutinizer05afaik00:05
keriowould it need a more recent kernel?00:05
DocScrutinizer05prolly needs backport of the ERASE (or TRIM) ioctl command to mmc_core00:06
DocScrutinizer05or a more recent kernel, freemangordon coult test it00:06
DocScrutinizer05could*00:06
DocScrutinizer05http://lxr.free-electrons.com/source/drivers/mmc/core/core.c#L219800:07
DocScrutinizer05(wildly guessing there, no kernel developer)00:09
DocScrutinizer05modinfo mmc_core00:11
DocScrutinizer05objdump -t /lib/modules/2.6.28-omap1/mmc_core.ko  dunno00:13
DocScrutinizer05freemangordon: could you test fstrim on emmc?00:37
*** DrCode has quit IRC00:38
DocScrutinizer05(,ake sure eMMC volume isn't mounted -o discard)00:38
keriowhy shouldn't it00:39
keriofstrim should still work, right00:39
DocScrutinizer05otherwise we have unsolicited TRIM in between00:39
*** trx has quit IRC00:40
DocScrutinizer05so any such test would be rather meaningless with -o discard, no?00:40
keriooh, performance tests00:41
kerioyeah, if it worked00:41
DocScrutinizer05<kerio> write random data over the whole thing  then write random data again, measuring the speed  then ERASE [rm -r *; fstrim] everything  then write random data again and measure the speed00:44
*** trx has joined #maemo-ssu00:44
*** DrCode has joined #maemo-ssu00:53
*** handaxe has joined #maemo-ssu00:57
*** handaxe has quit IRC01:01
*** freemangordon has quit IRC01:05
*** freemangordon has joined #maemo-ssu01:12
ShadowJKI have the impression that there isn't all that much sophistication that can be squeezed into emmc, that trim is mostly a NOOP unless you give it a full 8MB block properly aligned that it can erase01:33
ShadowJKOr however big it gets reported as in /sys/block/.../preferred_erase_size or something like that01:33
DocScrutinizer05ShadowJK: TRIM is not about erase01:40
DocScrutinizer05https://www.youtube.com/watch?v=x6lqYU4j7no01:42
DocScrutinizer05when controller copies an erase page to change one block in it, it can leave out resp skip copying of the blocks tagged as TRIMed01:43
DocScrutinizer05so those are fresh unused blocks on the new page, ready to take new data01:44
DocScrutinizer05worst case when all blocks been used to write some (possibly already obsolete) data to them, each write of one block (to overwrite the obsolete old content) involves copy of one complete erase page just to replace that one block01:46
DocScrutinizer05if all the blocks of the page been tagged as TRIMed, the copy would result in just one used and many free blocks in the new erase page01:47
DocScrutinizer05when you fill the complete MMC with one file and then delete that file on fs level, subsequent writes to the device to fill it again completely with data would either cause $number-of-blocks page copies without TRIM, or only $number-of-erasepages copies with TRIM01:49
DocScrutinizer05to accomplish that on controller level, you need just one bit per block in metadata01:51
keriothat's how things should go01:58
kerioon the other hand, hardware manufacturers will likely do the absolute bare minimum for anything01:59
kerioi mean01:59
kerioactual SSDs that *do* advertise ATA TRIM support actually fuck it up01:59
keriobecause of firmware bugs01:59
DocScrutinizer05that's a completely different story02:00
keriodo you really expect a MMC firmware to handle a barely used feature correctly and in a way that enhances performance02:00
keriomaybe you can ask about it for the neo90002:01
DocScrutinizer05hardware manufacturers try to create as good a product as possible from a given amount of resources. A two bits per block used to tag free blocks with either 00 or 11 while used blocks are 01 doesn't cost them anything and will provide a selling point in datasheet02:02
DocScrutinizer05barely used is nonsense02:02
keriowell, is it a point in the n900 emmc datasheet?02:02
DocScrutinizer05obviously all android phones use that02:02
kerioi mean, i'd pay more for it02:02
keriobut nokia probably didn't02:02
DocScrutinizer05the point is you won't have to pay more for it02:03
DocScrutinizer05it's a mere one-shot effort to implement it in controller firmware02:03
DocScrutinizer05so the cost per chip ~= zilch02:04
DocScrutinizer05and microsoft obviously even specifies a max duration a single block write may take, or something along that line, which is only achievable with proper TRIM support02:08
DocScrutinizer05the datasheet for eMMC in N900 says it's >>Full compliance w ith JEDEC/MMCA Ver. 4.3<<, so all you have to do is to find the specs JEDEC only publishes for registered users02:12
Palinormal trim command cannot be send in queue for ATA disks02:23
Paliso before sending trim, you need to wait until queue of commands are empty02:23
Paliand so using trim can slow down read/write operations of disks02:24
Paliyes, there is also queud trim ATA command, but it is not supported by Microsoft and Apple systems02:25
Paliand so if something advertise that supports it, it is buggy02:26
Palibefore playing with discard on linux, look at this loooong table: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/ata/libata-core.c#n427002:26
keriosamsung controller botched queued trim => queued trim is broken for every SSD forever02:31
kerioseems good02:31
DocScrutinizer05I'm not interested in discard. TRIM however will be mad useful02:31
DocScrutinizer05-o discard is arguably not the best way to do TRIM anyway02:32
Palidiscard is just linux API for trim for FS02:32
keriohe wants to use trim or something like it for the swap partition02:32
DocScrutinizer05please read about batch trim vs online trim02:32
kerio(i'm not entirely sure it's a thing on linux tbh)02:32
DocScrutinizer05refer to fstrim02:33
DocScrutinizer05for example02:33
Palidiscard is good idea, but only useful when all layers supports queued trim && queued trim is implemented correctly in FW02:33
kerioi'm pretty sure that both me and Pali understand the difference, doc02:33
DocScrutinizer05no, queued trim is only needed for -o discard aka online trim02:34
kerioPali: the real best way to "do TRIM" is to aggressively reuse sectors, anyway02:34
DocScrutinizer05huh?02:34
kerioso you don't need separate commands except when you absolutely have to02:34
DocScrutinizer05sorry that's absolute nonsense02:34
kerioDocScrutinizer05: on a SSD, TRIM means "i don't need this LBA address anymore"02:35
DocScrutinizer05the whole point about trim is that you _cannot_ 'reuse sectors'02:35
keriowut02:35
keriothe controller will remap your writes all over the place anyway02:35
DocScrutinizer05each "reuse sector" means you need to do a erase page copy02:36
kerio...no it doesn't02:36
keriounless there's no more space02:36
DocScrutinizer05which is exactly whyt TRIM accomplishes: free space02:36
*** dafox has quit IRC02:37
kerioyes, but if you're deleting a file and creating a new one02:37
kerioyou can just put the new one on the same logical address of the old one02:37
DocScrutinizer05so what?02:37
kerioand you'll have the same effect without having to issue a separate command02:37
DocScrutinizer05no you can't use the same physical address02:37
kerioyes02:37
keriowhich is why i said logical address02:38
DocScrutinizer05please, read e.g. http://www.thessdreview.com/daily-news/latest-buzz/garbage-collection-and-trim-in-ssds-explained-an-ssd-primer/02:38
DocScrutinizer05you using same logical address means the complete physical erase page needs to get copied to change your one sector you write to02:39
Palianyway, it is not better to have direct access to NAND erase blocks and use e.g. ubifs on NAND directly?02:39
kerioDocScrutinizer05: that's for the ssd controller to decide02:39
DocScrutinizer05so "agressively reuse sectors" is meaningless at best, worst case more likely02:39
DocScrutinizer05Pali: we talk about eMMC02:40
DocScrutinizer05not NAND aka mtd02:40
kerioPali: in theory, sure02:40
kerioin practice, separation of concerns has proven to be more successful02:40
DocScrutinizer05actually ubifs implements pretty much exactly the same scheme on application processor which the controller of emmc uses for TRIM and wear leveling02:41
kerioi'd trust a SSD plus ZFS over UBIFS on raw flash, if only because the tools are better02:41
Paliis not eMMC some flash or nand memory with own software on it?02:42
DocScrutinizer05on MMC your only way to have some control over page erases is to use ERASE/TRIM02:42
kerioyeah, and in theory more control should yield better results02:42
keriowhich is the same argument for software raid over hardware raid02:42
keriohowever that hasn't become the case in modern computer hardware02:43
kerioprobably because SSD controllers that take the raw flash and turn it into a perfect block device are Good Enough02:43
DocScrutinizer05only as long as they can keep the erase pages for all concurrent write(pointer)s in buffer RAM02:49
DocScrutinizer05as soon as the buffer RAM gets filled they need to write back one erase page sized chunk of data to make space for reading in another page so the next sector/block write can modify it02:51
DocScrutinizer05and depending on several other system parameters you don't want to keep large amounts of dirty buffers all the time, since... powerfail02:52
DocScrutinizer05I guess some SSDs even have their own battery to write back dirty buffers on powerfail (incl regular power down powerfail)02:53
DocScrutinizer05generally speaking you have little to no problems with SSD and TRIM and performance impact therefrom as long as you do a single sequential write, since all controllers can keep a single erasepage in RAM buffer02:55
DocScrutinizer05and they usually won't write it back until another erase page gets accessed or a certain timeout expired between interface write() commands02:56
DocScrutinizer05so the trivial controller can always read an erase page into RAM1, wait until all blocks of that page got modified by sequential write() commands from system, then on next write() after that read in the next erasepage into RAM2 with the sector to modify and writre back RAM1_dirty to flash while RAM2 gets modified by further sequential write() cmds03:00
DocScrutinizer05for random access writes stuff soon starts to get cluttered03:01
DocScrutinizer05so to really test TRIM, you's probably want 500 files size 1/500 of SSD capacity each, thentruncate them to 1 byte length each, do trim, and then append to each of them concurrently again03:05
DocScrutinizer05while this is the 100% worst case scenario, in normal operation similar scenarios will happen more often than not, while strictly sequential write is the rather unlikely usecase03:06
DocScrutinizer05a pretty everyday scenario for almost worst case: swap03:08
DocScrutinizer05unless you configure kswapd in a way so it always writes complete aligned erasepage-sized chunks03:10
DocScrutinizer05as soon as you have one byte misalignment, optimum case turns into worst case where every swapped out page involves two erasepage read modify erase write cycles03:12
DocScrutinizer05btw on HDD you see similar effects when your drive always read/modify/writes a complete track instead of on-the-fly insert-writing a single sector03:15
DocScrutinizer05just a page erase on SSD takes much longer that one platter stack spin in a HDD03:16
DocScrutinizer05((strictly sequential write is the rather unlikely usecase)) also think fragmentation which happens on a FS level and is not visible/understandable by SSD/MMC controller03:41
DocScrutinizer05writing a file sequentially into a fragmented filesystem also is pretty much random access write03:42
DocScrutinizer05note that some SD-card controllers even are known to understand FAT fs and do shadow trim by locating and observing the used blocks table(s)03:44
DocScrutinizer05I don't know how risky that is, I guess they must have implemented quite some heuristics and safeguard monitors to stop this as soon as the slightest doubt about the FS used comes up03:46
*** merlin1991 has quit IRC03:53
*** Pali has quit IRC04:21
*** DrCode has quit IRC05:32
*** DrCode has joined #maemo-ssu05:52
*** DocScrutinizer05 has quit IRC07:00
*** DocScrutinizer05 has joined #maemo-ssu07:00
*** Sicelo has quit IRC07:59
*** NIN101 has quit IRC07:59
*** dos1 has quit IRC07:59
*** handaxe has joined #maemo-ssu11:02
*** handaxe has quit IRC11:06
*** LauRoman|Alt has joined #maemo-ssu11:13
*** dos1 has joined #maemo-ssu11:53
*** Sicelo has joined #maemo-ssu11:54
*** NIN101 has joined #maemo-ssu11:54
ShadowJKI've always wanted to have this on my phones: https://lwn.net/Articles/518988/12:23
*** handaxe has joined #maemo-ssu12:28
*** handaxe has quit IRC12:29
*** LauRoman|Alt has quit IRC12:45
*** Pali has joined #maemo-ssu13:18
*** merlin1991 has joined #maemo-ssu14:01
*** dafox has joined #maemo-ssu14:44
*** dafox has quit IRC15:16
*** dafox has joined #maemo-ssu15:52
DocScrutinizer05hehe > It seems that as hardware gets smarter, we need to make even more clever software to manage that "smartness"<<16:07
DocScrutinizer05hmmm   >>One area of difficulty is that the shape of an f2fs (such as section and zone size) needs to be tuned to the particular flash device and its FTL; vendors are notoriously secretive about exactly how their FTL works. f2fs also requires that the flash device is comfortable having six or more concurrently "open" write areas.<<16:10
DocScrutinizer056 yeafrs old?16:31
DocScrutinizer05years even16:31
DocScrutinizer05oh nope, only 416:32
ShadowJKI have some cards that are comfortable with 12 open areas18:38
ShadowJKIn any case, 6 open areas is always better than random18:38
keriohm18:40
keriocan we replace the internal eMMC with something based on ram?18:40
keriosurely making things non-volatile is harder18:40
bencoh?18:57
DocScrutinizer05kerio: I'm searching for mixed RAM/flash chips with a sincle storage interface (aka ramdisk) since ages, nothing found so far20:40
DocScrutinizer05single*20:40
DocScrutinizer05I mean, how hard could it be to implement one-plus GB RAM buffer in FTL and operate it in a dedicated overlay mode where you define a start address where buffer is used instead of ever writing back stuff to flash?20:42
kerioand then you remove the flash20:43
DocScrutinizer05FTLs already implement write protect afaik20:43
DocScrutinizer05I would prefer still keeping flash in same chip behind same interface/bus20:44
keriobut who cares about flash20:45
DocScrutinizer05everybody?20:45
keriowe have microSDs for that20:45
*** LauRoman|Alt has joined #maemo-ssu22:31

Generated by irclog2html.py 2.15.1 by Marius Gedminas - find it at mg.pov.lt!