Posted on
Software

uniq: Filter duplicate lines

Author
  • User
    Linux Bash
    Posts by this author
    Posts by this author

Mastering the uniq Command: Filtering Duplicate Lines in Linux Bash

In the diversified toolbox of Linux command-line utilities, the uniq command stands out for its proficiency in filtering duplicate lines from a sorted stream of data. Whether you are a system administrator or software developer, mastering this command can greatly enhance your scripting tasks by simplifying the process of identifying unique lines in text files or data streams. In this article, we'll explore the basics of the uniq command, its usage scenarios, and provide instructions to install it on various Linux distributions.

What is uniq?

The uniq command in Linux is a command-line utility that reads from a sorted input and writes out unique or duplicate lines to the output. It is commonly used in conjunction with the sort command to filter or count unique entries in text data that needs to be sorted first.

Key Features of uniq

  1. Eliminating Duplicate Lines: By default, uniq removes duplicate adjacent lines.
  2. Counting Occurrences: With the -c option, uniq can count how many times each line appears.
  3. Ignoring Characters: uniq allows ignoring a specific number of characters at the beginning of lines.
  4. Skipping Fields: Similar to characters, it can skip the first 'n' fields.

Installation Instructions

Ubuntu and Debian Systems

To install uniq on Ubuntu, Debian, and other similar distributions that use APT as their package manager, you can generally find uniq pre-installed as part of the GNU core utilities. However, if it's missing for any reason, you can install it using the following steps:

sudo apt update
sudo apt install coreutils

Fedora, RHEL, and CentOS

On distributions like Fedora, which use the DNF package manager, you can ensure uniq and its utilities are installed using:

sudo dnf check-update
sudo dnf install coreutils

openSUSE

For openSUSE and other systems using Zypper, the uniq command is again typically part of the default installation, available in the coreutils package:

sudo zypper refresh
sudo zypper install coreutils

How to Use uniq

Assuming you have a file named data.txt containing sorted lines (as uniq works on sorted data), you can employ the uniq command in several ways:

  1. Removing Duplicate Lines
sort data.txt | uniq

This will print the sorted data from data.txt without any adjacent duplicate lines.

  1. Counting Line Occurrences
sort data.txt | uniq -c

This will display counts next to the unique lines indicating how many times each appeared.

  1. Ignoring case by considering only first n characters for comparison
sort data.txt | uniq -i -w 10

This invocation ignores case and compares just the first 10 characters of each line.

Practical Uses of uniq

  • Log Analysis: Quickly summarizing and counting logged entries.

  • Data Cleanup: Removing unnecessary repetitions in data processing tasks.

  • Survey Analysis: Compiling unique responses from sorted data.

Conclusion

The uniq command is both powerful and nuanced, enabling the execution of complicated data processing tasks with simple command-line instructions. By integrating uniq with other Unix utilities like sort, you can power through tasks involving large data sets with ease. Give it a try and start incorporating it into your regular data processing workflows!

Remember, proficient use of command-line tools like uniq can significantly optimise your performance in handling text data, making it an indispensable skill in your Linux command repertoire.