Difference between input range and forward range
Jonathan M Davis via Digitalmars-d
digitalmars-d at puremagic.com
Thu Nov 12 01:11:20 PST 2015
On Wednesday, 11 November 2015 at 22:34:32 UTC, Jesse Phillips
wrote:
> On Tuesday, 10 November 2015 at 16:07:01 UTC, Jonathan M Davis
> wrote:
>> generic code, you have to consider it to be consumed, because
>> the state of range you passed to foreach is now undefined,
>> since what happens when copying the range is undefined. This
>> is true even if you put a break out of the loop, because the
>> range was copied, and you simply cannot rely on the state of
>> the range you passed to foreach after that copy.
>
> The problem I find with generic code is when the desire is to
> consume the data. Take this example of parsing a data stream.
>
> auto osmData = datastream.take(size).array;
> datastream.popFrontN(size);
> auto header = BlobHeader(osmData);
>
> http://he-the-great.livejournal.com/49636.html
>
> How do I know if popFrontN is needed? If I was given a value
> base range then it is. If I was given a reference range (in its
> many forms) the extra call to popFrontN will result in an
> unneeded data jump. I could require that a forward range is
> passed in, then I can save() before calling .array and thus
> always require popFrontN.
>
> The best option is probably to use the RefRange wrapper, but it
> does create an annoying element of surprise.
Well, if we're talking forward ranges, then the only way to be
100% consistent with this is to basically consider datastream
unusable after the call to take, because its state is undefined.
So, the correct way to handle this would be to do something like
auto osmData = datastream.save.take(size).array;
datastream.popFrontN(size);
Now, that's ugly, but it does ensure that the code will work
correctly regardless of whether the range is a reference type,
value type, or pseudo-reference type. And if you wanted to
guaranteed reference semantics, as you say, you could use
RefRange, though you probably do have to be careful about that.
Regardless, it highlights how save needs to be called all over
the place if you want ranges which are reference types to work
consistently with value types.
The bigger problem is pure input ranges. If you do
auto osmData = datastream.take(size).array;
then the state of datastream is undefined (at least in generic
code), and you can't do _anything_ with it. If datastream is
reference type, then using it would work just fine, since the
first size elements would have been consumed, and we shouldn't
have to worry about value types (because they can always be
forward ranges - though it's technically possible for someone to
not make them forward ranges even when they should be). However,
we _do_ have a problem with pseudo-reference types. A pure input
range pretty much has to be a reference type with regards to its
elements (otherwise it could be a forward range), but stuff like
caching front can turn it into a pseudo-reference type and
totally break code like this. With a full-on reference type
auto osmData = datastream.take(size).array;
results in datastream being the same as it would have been had
you called datastream.popFrontN(size) instead. But with a cached
front, while the subsequent elements would be correct, front
would be wrong.
So, really, you're highlighting a really nasty aspect of this
problem. As long as we allow pseudo-reference types for pure
input ranges (and as I understand it, existing stuff like vibe.d
has them), I don't see how we can make this code work correctly
and access any of the elements in the range that were after the
elements that were accessed via take.
Actually, even with full-on reference types, we're kind of
screwed with take and input ranges. take is lazy, so if you do
auto osmData = datastream.take(size);
datastream.popFrontN(size);
and datastream is a reference type, then take will end up
referring to to the second n elements, not the first.
*sigh* Pure input ranges suck. It's ugly as all get out with
forward ranges, but liberal use of save can guarantee consistent
semantics. But with pure input ranges...
I suspect that there's a whole pile of algorithms that
technically should never be used with pure input ranges or which
require that you be _very_ careful with them. It would be
ludicrous to not have take work with input ranges, but it's quite
clear that with a pure input range, calling take _and_ accessing
elements beyond the ones taken isn't going to work unless you
know that the range is a reference type, and you make sure that
you iterate through the result of take _before_ doing anything
with the rest of the range.
I think that it's pretty clear that we need to re-examine how
pure input ranges should work and either make a change to them or
have some very clear guidelines on how to use them (which is not
likely going to be easy to do correctly) and probably disallow
them for most algorithms.
Yuck. I'm definitely going to have to stew on this one. I've
always thought that pure input ranges were too restrictive to be
useful in most cases (though for some stuff you're pretty much
stuck with them without doing a lot of buffering), but they seem
to get worse every time we examine them in depth.
- Jonathan M Davis
More information about the Digitalmars-d
mailing list