Talk:Delta encoding

C code is wrong[edit]

I think the C code is not correct

You are correct.

If you run the example in the text trough this algorightm (2, 4, 6, 9, 7), the encoded becomes:

2, 2, 4, 5, 2

The decode then becomes:

2, 4, 8, 13, 15

The decoder is correct (checked with encoded data 2, 2, 2, 3, -2):

2, 4, 6, 9, 7 (which is the right awnser).

I will correct the encoder after I add this comment. ShadowLord 09:14, 7 Apr 2005 (UTC)

The C code is utter rubbish. Delta encoding requires solving the longest common substring problem. Many approximate solutions exist, but exact solution requires O(n**2) operations. — Preceding unsigned comment added by 107.16.107.127 (talk) 22:57, 24 April 2013 (UTC)[reply]

Sample Java code[edit]

The following Java code performs a simple form of delta encoding and decoding:

public class Delta {
	public static int[] deltaEncode(int[] buffer) {
		int original, delta = 0;
		for (int i = 0; i < buffer.length; ++i) {
			original = buffer[i];
			buffer[i] -= delta;
			delta = original;
		}
		return buffer;
	}

	public static int[] deltaDecode(int[] buffer) {
		int delta = 0;
		for (int i = 0; i < buffer.length; ++i) {
			buffer[i] += delta;
			delta = buffer[i];
		}
		return buffer;
	}

	public static void main(String[] args) {
		int[] buffer = { 2, 4, 6, 9, 7 };
		buffer = deltaEncode(buffer);
		for (int c : buffer) {
			System.out.print(c + " ");
		}
		System.out.println();

		buffer = deltaDecode(buffer);
		for (int c : buffer) {
			System.out.print(c + " ");
		}
	}
}

Generalisation of delta encoding[edit]

Could this article be expanded to a more general view of delta encoding as encoding only the differences between things? A mention of diff and similar might be quite good. porges 09:38, Apr 7, 2005 (UTC)

I've changed the text to link to delta compression article, which have more direct relation to revision control in general and diff in particular. 217.26.163.26 07:23, 4 May 2005 (UTC)[reply]

Error control[edit]

Error control has different meaning in this article - it's used to find differences from other version of the same file without transfering it, not to detect corrupted data. This is the way how rsync/zsync use it. Please, don't change this meaning! 217.26.163.26 07:06, 4 May 2005 (UTC)[reply]

symmetric delta[edit]

The article currently states

A symmetric delta can be expressed as

\Delta (v_{1},v_{2})=(v_{1}\backslash v_{2})\cup (v_{2}\backslash v_{1})

, where

v_{1}

and

v_{2}

represent two successive versions.

OK, that is a pretty formula, but what does it mean? (What does the "\" mean in this context?)

I'm guessing it means something like

Given two versions

v_{1}

and

v_{2}

, we can compute "the delta" between them.

A directed delta tells us, once we know

v_{1}

, how to generate

v_{2}

-- but if we only know

v_{2}

and the directed delta, it's generally impossible to recover

v_{1}

.

A symmetric delta can go both ways -- if we know

v_{1}

, it tells us how to generate

v_{2}

. But if we only know

v_{2}

, that same symmetric delta tells us how to generate

v_{1}

.

Is that all it means ? Is it OK if I replace the mysterious formula with this English-language description? --DavidCary 03:19, 6 October 2005 (UTC)[reply]

If I read the formula correctly, it means the same as $\Delta _{s}(v_{1},v_{2})=(v_{1}\cup v_{2})\backslash (v_{1}\cap v_{1})$ , which I can read easier in English: the symmetric delta of two versions is the elements of each version without the elements common to both.

I'm guessing that directed delta could be defined by $\Delta _{d}(v_{1},v_{2})=v_{2}\backslash v_{1}$ , and would read: the elements of second version without the elements of the first.

I don't feel that the formula adds much to the article. --Eruionnyron 19:10, 9 October 2005 (UTC)[reply]

When dealing with set (mathematics) (an unordered collection of elements, no two elements the same), "\" means complement (set theory), and "

\cup

" means set union. However, a computer file is an an ordered series of bytes, with the same byte often occurring multiple times. I still don't understand: What do those operators mean when applied to two versions of a computer file? --DavidCary (talk) 15:59, 22 May 2015 (UTC)[reply]

Potential sources for new article[edit]

This article is crappy. I'll rewrite it, but having a new list of sources would be quite helpful. If anyone wants to chip in I'd be very pleased.

Delta algorithms: an empirical analysis goofyheadedpunk 03:54, 29 July 2006 (UTC)[reply]

What??[edit]

"Perhaps the simplest example is storing values of bytes as differences (deltas) between sequential values, rather than the values themselves. So, instead of 2, 4, 6, 9, 7, we would store 2, 2, 2, 3, -2."

This wouldn't save any space at all. Is this actually used to save computer data file revisions? I highly doubt it. We already have an article for delta modulation, as used in audio (and similar to what is used in video). This article is supposed to be about saving revisions of large chunks of data, right? Involving the unix diff utility? Not about the difference between subsequent samples within a chunk of data. — Omegatron 20:55, 18 May 2007 (UTC)[reply]

So what if it doesn't save space, it is still delta encoding. Delta encoding is used for more than file revisions. Delta modulation isn't exactly the same thing. Nikola (talk) 06:41, 19 October 2008 (UTC)[reply]

Actually such delta encoding MAY save space if there is other algorithm runs AFTER delta algorithm. For example, LZ or RLE algorithms can't do anything with sequence like 2,4,6,9,7. But 2,2,2 surely can be compressed further to something like instruction to repeat '2' digit for 3 times and this could be more compact than storing 2, 2, 2 directly. This is a well-known method of improving compression ratio of archivers on certain kinds of data (like audio and other certain sorts of data as well) and this IS kind of delta encoding. Not to mention ADPCM and similar algos. —Preceding unsigned comment added by 91.78.236.168 (talk) 15:12, 5 December 2008 (UTC)[reply]

Yes, the point of delta coding is not compression but to enable other effective compression techniques. In particular, universal codes are most efficient on small values. Dcoetzee 15:41, 5 December 2008 (UTC)[reply]

Varint encoding benefits a LOT from delta encoding, as it can use from 1 to 4 bytes to represent each integer, and a delta-encoded stream will increase the number of small values @nobre November 7, 2011 — Preceding unsigned comment added by 200.251.63.3 (talk) 15:55, 7 November 2011 (UTC)[reply]

I agree that "subtracting" each sample from the consecutive next sample is a common and useful technique when compressing a *single* audio file, a technique that should be described somewhere in the Wikipedia.

However, I agree with Omegatron that such differential pulse-code modulation is conceptually different from delta copying, which always involves 2 different files (or at least 2 different versions of a file).

How can we avoid confusing our readers? --DavidCary (talk) 19:15, 2 February 2015 (UTC)[reply]

"skip deltas" variant in Subversion[edit]

apparently Apache Subversion uses a variant of delta encoding with "revision N is stored as a delta against N-f(N), where f(N) is the greatest power of two that divides N". Some discussion here http://svn.apache.org/repos/asf/subversion/trunk/notes/skip-deltas . I guess this should go into Delta encoding#Variants section. 76.119.30.87 (talk) 00:21, 11 February 2015 (UTC)[reply]

The question is, if it really uses skip-deltas. The explanatory doc on https://svn.apache.org/repos/asf/subversion/trunk/notes/skip-deltas explains the algorithm, but if you follow its rules straightly (flip the rightmost 1-bit to 0) then (in the FSFS backend) every revision being a power of 2 should be huge in size, because it is a delta against the zero-revision, if I have understood it correctly. - But I've used SVN myself and have never seen such a behaviour, even on a repository with large and often changing binary files (e.g. like images): Revisions ..., 512, 1024, 2048, 4096, etc. had all sizes comparable to those of the neighboring "normal" revisions where skip-deltas would not have such a large size-effect. And yes, it was a FSFS repo.

Also, this https://dev.subversion.apache.narkive.com/DGt7Z6RY/do-svn-actually-uses-skip-delta thread concludes that meanwhile the algorithm could've grown more complex or get not used at all, or maybe gets at least not used for low revision numbers.

Finally, it seems there is little scientific citable literature on the topic, if you believe https://svn.haxx.se/users/archive-2005-09/0100.shtml and this https://softwareengineering.stackexchange.com/questions/247139/are-skip-deltas-unique-to-svn Stackexchange posting, so you would probably have problems to fortify a WP-article with sources.

Nonetheless I would of course also be glad if someone could shed some light on the "cryptic skip-delta topic of SVN" and finally resolve the mystery around it ;). --77.190.182.210 (talk) 13:02, 14 June 2019 (UTC)[reply]