Reading text in bibleworks 10

#Reading text in bibleworks 10 full#

We've now *effectively* read this much data. to overlapping data: we *might* just have read 7 bytes instead ofĪrray.Copy(buffer, bufferSize, buffer, bytesToRead, leftOverData) Buffer.BlockCopy doesn't document its behaviour with respect If (leftOverData > 0 & bytesToRead != bufferSize) over from before, copy them to the end of the buffer

#Reading text in bibleworks 10 full#

If we haven't read a full buffer, but we had bytes left StreamUtil.ReadExactly(stream, buffer, bytesToRead) Int bytesToRead = Math.Min(position > int.MaxValue ? bufferSize : (int)position, bufferSize) the carriage-return at the end of this buffer - hence this needs declaring A line-feed at the start of the previous buffer means we need to swallow Therefore we don't return an empty string if it's our *first* TextReader doesn't return an empty string if there's line break at the end read which didn't quite make it as full charactersīyte buffer = new byte Ĭhar charBuffer = new char

Allow up to two bytes for data from the start of the previous Throw new InvalidDataException("UTF-16 encoding provided, but stream has odd length.") If (encoding is UnicodeEncoding & (position & 1) != 0) Private IEnumerator GetEnumeratorImpl(Stream stream) Throw new NotSupportedException("Unable to read within stream") Throw new NotSupportedException("Unable to seek within stream") / the returned stream is either unreadable or unseekable, a NotSupportedException is thrown. / Returns the enumerator reading strings backwards.

Throw new ArgumentException("Only single byte, UTF-8 and Unicode encodings are permitted") For UTF-8, bytes with the top bit clear or the second bit set are the start of a characterĬharacterStartDetector = (pos, data) => (data & 0x80) = 0 || (data & 0x40) != 0 More work requiredĬharacterStartDetector = (pos, data) => (pos & 1) = 0 TODO: This assumes no surrogate pairs. For UTF-16, even-numbered positions are the start of a character. For a single byte encoding, every byte is the start (and end) of a characterĬharacterStartDetector = (pos, data) => true Internal ReverseLineReader(Func streamSource, Encoding encoding, int bufferSize) : this(streamSource, encoding, DefaultBufferSize) Public ReverseLineReader(Func streamSource, Encoding encoding) / Encoding to use to decode the stream into text / called when the enumerator is fetched. : this(() => File.OpenRead(filename), encoding) Public ReverseLineReader(string filename, Encoding encoding) / Encoding to use to decode the file into text Public ReverseLineReader(string filename) / UTF8 is used to decode the file into text. / (or even checked for existence) when the enumerator is fetched. / Creates a LineReader from a filename. Public ReverseLineReader(Func streamSource) / Creates a LineReader from a stream source. / or not the byte represents the start of a character. / Function which, when given a position within a file and a byte, states whether This must be at least as big as the maximum number of / Size of buffer (in bytes) to read each time we read from the / Encoding to use when converting bytes to text / Means of creating a Stream to read from.

Private const int DefaultBufferSize = 4096 / a different buffer size - this is useful for testing. Public sealed class ReverseLineReader : IEnumerable

/ returned by the function must be seekable. / Only single byte encodings, and UTF-8 and Unicode, are supported. / (or a filename for convenience) and yields lines from the end of the stream backwards. / Takes an encoding (defaulting to UTF-8) and a function which produces a seekable stream Oh, and it needs refactoring - there's one pretty hefty method, as you'll see: using System It uses StreamUtil from MiscUtil, but I've included just the necessary (new) method from there at the bottom. There's nothing built into the framework, and I suspect you'd have to do separate hard coding for each variable-width encoding.ĮDIT: This has been somewhat tested - but that's not to say it doesn't still have some subtle bugs around. When you've got variable-size encoding (such as UTF-8) you will keep having to check whether you're in the middle of a character or not when you fetch data. Reading text files backwards is really tricky unless you're using a fixed-size encoding (e.g.