IRC log of #maemo-ssu for Sunday, 2016-07-10

kerio	and the ludicrous intel DC P3608 4TB gets 3GB/s of sequential writes	00:00
kerio	(PCIe 3.0 x8)	00:00
DocScrutinizer05	highly irrelevant	00:01
kerio	it's only 8959.99 USD on amazon.com	00:01
DocScrutinizer05	the point is that writing zeroes isn't a working replacement for TRIM	00:03
kerio	indeed	00:03
DocScrutinizer05	the controller needs to receive a hint that the block doesn't contain valid data	00:04
kerio	it should be "easy" to test if that MMC ERASE command works	00:04
kerio	write random data over the whole thing	00:04
kerio	then write random data again, measuring the speed	00:05
kerio	then ERASE everything	00:05
kerio	then write random data again and measure the speed	00:05
DocScrutinizer05	exactly	00:05
kerio	do we have a debug utility for the eMMC?	00:05
DocScrutinizer05	also exactly what I had in mind, for swap	00:05
DocScrutinizer05	no	00:05
DocScrutinizer05	afaik	00:05
kerio	would it need a more recent kernel?	00:05
DocScrutinizer05	prolly needs backport of the ERASE (or TRIM) ioctl command to mmc_core	00:06
DocScrutinizer05	or a more recent kernel, freemangordon coult test it	00:06
DocScrutinizer05	could*	00:06
DocScrutinizer05	http://lxr.free-electrons.com/source/drivers/mmc/core/core.c#L2198	00:07
DocScrutinizer05	(wildly guessing there, no kernel developer)	00:09
DocScrutinizer05	modinfo mmc_core	00:11
DocScrutinizer05	objdump -t /lib/modules/2.6.28-omap1/mmc_core.ko dunno	00:13
DocScrutinizer05	freemangordon: could you test fstrim on emmc?	00:37
*** DrCode has quit IRC		00:38
DocScrutinizer05	(,ake sure eMMC volume isn't mounted -o discard)	00:38
kerio	why shouldn't it	00:39
kerio	fstrim should still work, right	00:39
DocScrutinizer05	otherwise we have unsolicited TRIM in between	00:39
*** trx has quit IRC		00:40
DocScrutinizer05	so any such test would be rather meaningless with -o discard, no?	00:40
kerio	oh, performance tests	00:41
kerio	yeah, if it worked	00:41
DocScrutinizer05	<kerio> write random data over the whole thing then write random data again, measuring the speed then ERASE [rm -r *; fstrim] everything then write random data again and measure the speed	00:44
*** trx has joined #maemo-ssu		00:44
*** DrCode has joined #maemo-ssu		00:53
*** handaxe has joined #maemo-ssu		00:57
*** handaxe has quit IRC		01:01
*** freemangordon has quit IRC		01:05
*** freemangordon has joined #maemo-ssu		01:12
ShadowJK	I have the impression that there isn't all that much sophistication that can be squeezed into emmc, that trim is mostly a NOOP unless you give it a full 8MB block properly aligned that it can erase	01:33
ShadowJK	Or however big it gets reported as in /sys/block/.../preferred_erase_size or something like that	01:33
DocScrutinizer05	ShadowJK: TRIM is not about erase	01:40
DocScrutinizer05	https://www.youtube.com/watch?v=x6lqYU4j7no	01:42
DocScrutinizer05	when controller copies an erase page to change one block in it, it can leave out resp skip copying of the blocks tagged as TRIMed	01:43
DocScrutinizer05	so those are fresh unused blocks on the new page, ready to take new data	01:44
DocScrutinizer05	worst case when all blocks been used to write some (possibly already obsolete) data to them, each write of one block (to overwrite the obsolete old content) involves copy of one complete erase page just to replace that one block	01:46
DocScrutinizer05	if all the blocks of the page been tagged as TRIMed, the copy would result in just one used and many free blocks in the new erase page	01:47
DocScrutinizer05	when you fill the complete MMC with one file and then delete that file on fs level, subsequent writes to the device to fill it again completely with data would either cause $number-of-blocks page copies without TRIM, or only $number-of-erasepages copies with TRIM	01:49
DocScrutinizer05	to accomplish that on controller level, you need just one bit per block in metadata	01:51
kerio	that's how things should go	01:58
kerio	on the other hand, hardware manufacturers will likely do the absolute bare minimum for anything	01:59
kerio	i mean	01:59
kerio	actual SSDs that do advertise ATA TRIM support actually fuck it up	01:59
kerio	because of firmware bugs	01:59
DocScrutinizer05	that's a completely different story	02:00
kerio	do you really expect a MMC firmware to handle a barely used feature correctly and in a way that enhances performance	02:00
kerio	maybe you can ask about it for the neo900	02:01
DocScrutinizer05	hardware manufacturers try to create as good a product as possible from a given amount of resources. A two bits per block used to tag free blocks with either 00 or 11 while used blocks are 01 doesn't cost them anything and will provide a selling point in datasheet	02:02
DocScrutinizer05	barely used is nonsense	02:02
kerio	well, is it a point in the n900 emmc datasheet?	02:02
DocScrutinizer05	obviously all android phones use that	02:02
kerio	i mean, i'd pay more for it	02:02
kerio	but nokia probably didn't	02:02
DocScrutinizer05	the point is you won't have to pay more for it	02:03
DocScrutinizer05	it's a mere one-shot effort to implement it in controller firmware	02:03
DocScrutinizer05	so the cost per chip ~= zilch	02:04
DocScrutinizer05	and microsoft obviously even specifies a max duration a single block write may take, or something along that line, which is only achievable with proper TRIM support	02:08
DocScrutinizer05	the datasheet for eMMC in N900 says it's >>Full compliance w ith JEDEC/MMCA Ver. 4.3<<, so all you have to do is to find the specs JEDEC only publishes for registered users	02:12
Pali	normal trim command cannot be send in queue for ATA disks	02:23
Pali	so before sending trim, you need to wait until queue of commands are empty	02:23
Pali	and so using trim can slow down read/write operations of disks	02:24
Pali	yes, there is also queud trim ATA command, but it is not supported by Microsoft and Apple systems	02:25
Pali	and so if something advertise that supports it, it is buggy	02:26
Pali	before playing with discard on linux, look at this loooong table: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/ata/libata-core.c#n4270	02:26
kerio	samsung controller botched queued trim => queued trim is broken for every SSD forever	02:31
kerio	seems good	02:31
DocScrutinizer05	I'm not interested in discard. TRIM however will be mad useful	02:31
DocScrutinizer05	-o discard is arguably not the best way to do TRIM anyway	02:32
Pali	discard is just linux API for trim for FS	02:32
kerio	he wants to use trim or something like it for the swap partition	02:32
DocScrutinizer05	please read about batch trim vs online trim	02:32
kerio	(i'm not entirely sure it's a thing on linux tbh)	02:32
DocScrutinizer05	refer to fstrim	02:33
DocScrutinizer05	for example	02:33
Pali	discard is good idea, but only useful when all layers supports queued trim && queued trim is implemented correctly in FW	02:33
kerio	i'm pretty sure that both me and Pali understand the difference, doc	02:33
DocScrutinizer05	no, queued trim is only needed for -o discard aka online trim	02:34
kerio	Pali: the real best way to "do TRIM" is to aggressively reuse sectors, anyway	02:34
DocScrutinizer05	huh?	02:34
kerio	so you don't need separate commands except when you absolutely have to	02:34
DocScrutinizer05	sorry that's absolute nonsense	02:34
kerio	DocScrutinizer05: on a SSD, TRIM means "i don't need this LBA address anymore"	02:35
DocScrutinizer05	the whole point about trim is that you _cannot_ 'reuse sectors'	02:35
kerio	wut	02:35
kerio	the controller will remap your writes all over the place anyway	02:35
DocScrutinizer05	each "reuse sector" means you need to do a erase page copy	02:36
kerio	...no it doesn't	02:36
kerio	unless there's no more space	02:36
DocScrutinizer05	which is exactly whyt TRIM accomplishes: free space	02:36
*** dafox has quit IRC		02:37
kerio	yes, but if you're deleting a file and creating a new one	02:37
kerio	you can just put the new one on the same logical address of the old one	02:37
DocScrutinizer05	so what?	02:37
kerio	and you'll have the same effect without having to issue a separate command	02:37
DocScrutinizer05	no you can't use the same physical address	02:37
kerio	yes	02:37
kerio	which is why i said logical address	02:38
DocScrutinizer05	please, read e.g. http://www.thessdreview.com/daily-news/latest-buzz/garbage-collection-and-trim-in-ssds-explained-an-ssd-primer/	02:38
DocScrutinizer05	you using same logical address means the complete physical erase page needs to get copied to change your one sector you write to	02:39
Pali	anyway, it is not better to have direct access to NAND erase blocks and use e.g. ubifs on NAND directly?	02:39
kerio	DocScrutinizer05: that's for the ssd controller to decide	02:39
DocScrutinizer05	so "agressively reuse sectors" is meaningless at best, worst case more likely	02:39
DocScrutinizer05	Pali: we talk about eMMC	02:40
DocScrutinizer05	not NAND aka mtd	02:40
kerio	Pali: in theory, sure	02:40
kerio	in practice, separation of concerns has proven to be more successful	02:40
DocScrutinizer05	actually ubifs implements pretty much exactly the same scheme on application processor which the controller of emmc uses for TRIM and wear leveling	02:41
kerio	i'd trust a SSD plus ZFS over UBIFS on raw flash, if only because the tools are better	02:41
Pali	is not eMMC some flash or nand memory with own software on it?	02:42
DocScrutinizer05	on MMC your only way to have some control over page erases is to use ERASE/TRIM	02:42
kerio	yeah, and in theory more control should yield better results	02:42
kerio	which is the same argument for software raid over hardware raid	02:42
kerio	however that hasn't become the case in modern computer hardware	02:43
kerio	probably because SSD controllers that take the raw flash and turn it into a perfect block device are Good Enough	02:43
DocScrutinizer05	only as long as they can keep the erase pages for all concurrent write(pointer)s in buffer RAM	02:49
DocScrutinizer05	as soon as the buffer RAM gets filled they need to write back one erase page sized chunk of data to make space for reading in another page so the next sector/block write can modify it	02:51
DocScrutinizer05	and depending on several other system parameters you don't want to keep large amounts of dirty buffers all the time, since... powerfail	02:52
DocScrutinizer05	I guess some SSDs even have their own battery to write back dirty buffers on powerfail (incl regular power down powerfail)	02:53
DocScrutinizer05	generally speaking you have little to no problems with SSD and TRIM and performance impact therefrom as long as you do a single sequential write, since all controllers can keep a single erasepage in RAM buffer	02:55
DocScrutinizer05	and they usually won't write it back until another erase page gets accessed or a certain timeout expired between interface write() commands	02:56
DocScrutinizer05	so the trivial controller can always read an erase page into RAM1, wait until all blocks of that page got modified by sequential write() commands from system, then on next write() after that read in the next erasepage into RAM2 with the sector to modify and writre back RAM1_dirty to flash while RAM2 gets modified by further sequential write() cmds	03:00
DocScrutinizer05	for random access writes stuff soon starts to get cluttered	03:01
DocScrutinizer05	so to really test TRIM, you's probably want 500 files size 1/500 of SSD capacity each, thentruncate them to 1 byte length each, do trim, and then append to each of them concurrently again	03:05
DocScrutinizer05	while this is the 100% worst case scenario, in normal operation similar scenarios will happen more often than not, while strictly sequential write is the rather unlikely usecase	03:06
DocScrutinizer05	a pretty everyday scenario for almost worst case: swap	03:08
DocScrutinizer05	unless you configure kswapd in a way so it always writes complete aligned erasepage-sized chunks	03:10
DocScrutinizer05	as soon as you have one byte misalignment, optimum case turns into worst case where every swapped out page involves two erasepage read modify erase write cycles	03:12
DocScrutinizer05	btw on HDD you see similar effects when your drive always read/modify/writes a complete track instead of on-the-fly insert-writing a single sector	03:15
DocScrutinizer05	just a page erase on SSD takes much longer that one platter stack spin in a HDD	03:16
DocScrutinizer05	((strictly sequential write is the rather unlikely usecase)) also think fragmentation which happens on a FS level and is not visible/understandable by SSD/MMC controller	03:41
DocScrutinizer05	writing a file sequentially into a fragmented filesystem also is pretty much random access write	03:42
DocScrutinizer05	note that some SD-card controllers even are known to understand FAT fs and do shadow trim by locating and observing the used blocks table(s)	03:44
DocScrutinizer05	I don't know how risky that is, I guess they must have implemented quite some heuristics and safeguard monitors to stop this as soon as the slightest doubt about the FS used comes up	03:46
*** merlin1991 has quit IRC		03:53
*** Pali has quit IRC		04:21
*** DrCode has quit IRC		05:32
*** DrCode has joined #maemo-ssu		05:52
*** DocScrutinizer05 has quit IRC		07:00
*** DocScrutinizer05 has joined #maemo-ssu		07:00
*** Sicelo has quit IRC		07:59
*** NIN101 has quit IRC		07:59
*** dos1 has quit IRC		07:59
*** handaxe has joined #maemo-ssu		11:02
*** handaxe has quit IRC		11:06
*** LauRoman\|Alt has joined #maemo-ssu		11:13
*** dos1 has joined #maemo-ssu		11:53
*** Sicelo has joined #maemo-ssu		11:54
*** NIN101 has joined #maemo-ssu		11:54
ShadowJK	I've always wanted to have this on my phones: https://lwn.net/Articles/518988/	12:23
*** handaxe has joined #maemo-ssu		12:28
*** handaxe has quit IRC		12:29
*** LauRoman\|Alt has quit IRC		12:45
*** Pali has joined #maemo-ssu		13:18
*** merlin1991 has joined #maemo-ssu		14:01
*** dafox has joined #maemo-ssu		14:44
*** dafox has quit IRC		15:16
*** dafox has joined #maemo-ssu		15:52
DocScrutinizer05	hehe > It seems that as hardware gets smarter, we need to make even more clever software to manage that "smartness"<<	16:07
DocScrutinizer05	hmmm >>One area of difficulty is that the shape of an f2fs (such as section and zone size) needs to be tuned to the particular flash device and its FTL; vendors are notoriously secretive about exactly how their FTL works. f2fs also requires that the flash device is comfortable having six or more concurrently "open" write areas.<<	16:10
DocScrutinizer05	6 yeafrs old?	16:31
DocScrutinizer05	years even	16:31
DocScrutinizer05	oh nope, only 4	16:32
ShadowJK	I have some cards that are comfortable with 12 open areas	18:38
ShadowJK	In any case, 6 open areas is always better than random	18:38
kerio	hm	18:40
kerio	can we replace the internal eMMC with something based on ram?	18:40
kerio	surely making things non-volatile is harder	18:40
bencoh	?	18:57
DocScrutinizer05	kerio: I'm searching for mixed RAM/flash chips with a sincle storage interface (aka ramdisk) since ages, nothing found so far	20:40
DocScrutinizer05	single*	20:40
DocScrutinizer05	I mean, how hard could it be to implement one-plus GB RAM buffer in FTL and operate it in a dedicated overlay mode where you define a start address where buffer is used instead of ever writing back stuff to flash?	20:42
kerio	and then you remove the flash	20:43
DocScrutinizer05	FTLs already implement write protect afaik	20:43
DocScrutinizer05	I would prefer still keeping flash in same chip behind same interface/bus	20:44
kerio	but who cares about flash	20:45
DocScrutinizer05	everybody?	20:45
kerio	we have microSDs for that	20:45
*** LauRoman\|Alt has joined #maemo-ssu		22:31

Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!