Check if all chars in a string are unique

Question: Write a function that determines if each characters in a String appear no more than once.

Related Tags:

Questions

  • What type of encoding is used?
  • How long, on average are the strings we are checking?

There are three main types of encoding - UTF and ASCII.

Unicode is used by almost all operating systems to define the internal text coding system. Unicode assigns each character a unique number (its code point). One of the mapping methods used is UTF. UTF-8 takes a minimum of 8 bits and a maximum of 32 bits per character. This means there are 1,111,998 possible characters.

ASCII is more limited towards English language, and has 128 possible characters. Each char is assigned a code point from 0 to 127.

Let's assume ASCII for simplicity.

1) Using a boolean array

If our string is much longer than 128 characters long, we can use a boolean array of size 128. Each character encountered in our string we will flag as true for that index. If we encounter an array cell that is already set to true, then this means that the character has already appeared, so we return false. This way, the maximum number iterations will be 128.

Algorithm Pseudocode

  1. If the length of the string is greater than the number of possible characters, then return false.
  2. Set up an array with size however large the encoding set is.
  3. For each character encountered, convert it to its code point and check its value in the array at that index.
  4. If it's set to false, set it to true.
  5. If it's true, that means we have already encountered it. Thus, return false.

Complexity

The time complexity is O(n), but the space complexity is O(k) where k is the size of number of characters the encoding may generate.

Implementation in Java

static boolean isUniqueArray(String str) {

	// string can't be longer than num encoding chars
	if (str.length() > NUMBER_OF_CHARS) {
		return false;
	}

	boolean[] array = new boolean[NUMBER_OF_CHARS];
	int curr;

	for (int i = 0; i < str.length(); i++) {
		curr = (int) str.charAt(i);

		if (array[curr]) {
			return false;
		} else {
			array[curr] = true;
		}
	}

	return true;
}

2) Using a bag or set

If the string is much shorter than the amount of characters in the character encoding, it would be good to use a bag or set. This way, the number of iterations is kept, at most, the length of the String.

Algorithm Pseudocode

Here's the algorithm pseudocode:

  1. Iterate through the String, selecting each character c.
  2. Check if c is in our set. If not, insert. If it is, return false.

Complexity

The time complexity remains the same as above O(n), but the space complexity is lowered to O(l), where l is the length of our string.

Implementation in Java

static boolean isUniqueSet(String str) {

	// string can't be longer than num encoding chars
	if (str.length() > NUMBER_OF_CHARS) {
		return false;
	}

	HashSet<Character> charSet = new HashSet<>();

	Character c;

	for (int i = 0; i < str.length(); i++) {
		c = str.charAt(i);
		if (charSet.contains(c)) {
			return false;
		} else {
			charSet.add(c);
		}
	}

	return true;
}

3) Checking each character

In case you aren't allowed to use additional space, we can implement an algorithm that compares each character to all other characters in the array. We can do this easily with two for-loops.

Complexity

Although this approach takes quadratic time (O(n2)), there is no extra auxillary space needed.

Implementation in Java

static boolean isUniqueCheck(String str) {

	// string can't be longer than num encoding chars
	if (str.length() > NUMBER_OF_CHARS) {
		return false;
	}

	char c;

	for (int i = 0; i < str.length(); i++) {

		c = str.charAt(i);

		for (int j = i+1; j < str.length(); j++) {
			if (c == str.charAt(j))
				return false;
		}

	}

	return true;
}

4) Sorting

You may also sort the array, then see if any character repeats itself. Depending on the algorithm used to sort, you may or may not use auxillary space.

Complexity

The time and space complexity depends on the algorithm, but we can bring the time complexity down to O(n log n) and space down to O(1).

Implementation in Java

static boolean isUniqueSort(String str) {

	// string can't be longer than num encoding chars
	if (str.length() > NUMBER_OF_CHARS) {
		return false;
	}

	char[] charArray = str.toCharArray();

	Arrays.sort(charArray);

	for (int i = 0; i < charArray.length - 1; i++) {
		if (charArray[i] == charArray[i+1]) 
			return false;
	}

	return true;
}

Came up with a better solution or have a question? Comment below!

Next Challenge: Cheryl's Birthday