Posted on
Software

shuf: Shuffle lines randomly

Author
  • User
    Linux Bash
    Posts by this author
    Posts by this author

Shuffling Text Lines Efficiently with shuf in Linux

In the world of Linux, efficiency is key. Whether you're a system administrator, a developer, or a data scientist, manipulating text data quickly and effectively can be crucial. One handy tool that deserves more attention is shuf, a command-line utility that randomly shuffles the lines of a file or input stream. This is particularly useful for tasks such as generating random samples, creating randomised lists, or even setting up conditions for simulations.

What is shuf?

shuf is a utility in GNU Coreutils, available by default on most Linux distributions. It reads a sequence of lines from a file (or standard input), randomly permutes them, and outputs the result. It can also generate a random permutation of the numbers 1 to N, making it a versatile tool for any tasks requiring random order.

Installation of shuf

While shuf is typically installed by default with the core utilities in many Linux distributions, there might be cases where it isn't available or needs to be manually installed. Here's how to ensure it's setup on your system:

On Debian and Ubuntu:

For Debian-based distributions like Ubuntu, you can use apt:

sudo apt update
sudo apt install coreutils

On Fedora:

Fedora and other distributions using dnf can install shuf from the core utilities package:

sudo dnf install coreutils

On openSUSE:

For openSUSE, the zypper package manager is the way to go:

sudo zypper install coreutils

How to Use shuf

Using shuf is straightforward. Here are some practical examples to get you started:

  • Shuffle the lines of a text file:

    shuf filename.txt
    
  • Shuffle and get only the first 5 lines: This can be useful for sampling or testing.

    shuf filename.txt -n 5
    
  • Shuffle by generating numbers: You might want to shuffle a range of numbers for lottery simulations or for generating test inputs.

    shuf -i 1-100 -n 5
    

This command shuffles numbers between 1 and 100 and outputs 5 of them.

Advanced Usage

shuf isn't just for basic shuffling. It can be integrated into scripts and combined with other utilities for more complex tasks:

  • Combine sort with shuf for a weighted randomness: You can process the data first by sorting (maybe based on a weighted column) and then shuffle the results.

    sort -nk3 filename.txt | shuf
    
  • Provide a random sample to another process: If other processes or scripts require randomly selected data, you can pipe the output of shuf directly.

    shuf filename.txt | some-other-command
    

Conclusion

shuf is a versatile and powerful tool underutilized in many circles, hidden among the more commonly used text processing utilities like awk, sed, and grep. Whether you're handling large datasets or need a random order for your script's input, shuf provides a straightforward and efficient solution. So next time you reach for a Python script or another heavier tool to randomise lines, consider shuf for its simplicity and speed. Happy shuffling!