Skip to main content

Building wc (word count) in Go

·1787 words·9 mins
Author
Rohan Chavan

When I was not able to find a good alternative to the Linux wc (word count) command for Windows, I decided to build my own version. Written in Go, this tool is not only a faithful port of the original wc but also platform-independent, meaning it works seamlessly on any operating system where Go is supported or you can simply cross compile stand alone binaries for different platforms.

In this blog, I’ll walk you through how I built this tool, explaining the design choices, the code structure, and how you can use it.

Project structure
#

│   go.mod
│   readme.md
├───bin
│       wc_linux_x64
│       wc_win_x64.exe
├───cmd
│       main.go
└───internal
    ├───models
    │       models.go
    ├───utils
    │       utils.go
    └───wc
            counter.go

The project is divided into four main components:

  • Main Entry Point (cmd/main.go): Handles the command-line interface and parses user input.
  • Models (internal/models/models.go): Defines the data structures used across the application.
  • Utilities (internal/utils/utils.go): Contains helper functions for flag management and output formatting.
  • Word Counter (internal/wc/counter.go): The core logic for processing input and counting words, lines, characters, and bytes.

Lets Go
#

Lets first start with writing our main function

The cmd/main.go file is where the program starts. Here, I used the flag package to handle command-line arguments, making it easy for users to specify what they want to count (lines, words, bytes, etc.).

cmd/main.go

--- SNIPPED ---
func main() {
	var c, m, l, w bool
	var bytes, chars, lines, words bool

	flag.BoolVar(&c, "c", false, "print the byte counts")
	flag.BoolVar(&bytes, "bytes", false, "print the byte counts")

	flag.BoolVar(&m, "m", false, "print the character counts")
	flag.BoolVar(&chars, "chars", false, "print the character counts")

	flag.BoolVar(&l, "l", false, "print the line counts")
	flag.BoolVar(&lines, "lines", false, "print the line counts")

	flag.BoolVar(&w, "w", false, "print the word counts")
	flag.BoolVar(&words, "words", false, "print the word counts")

	flag.Parse()

	availableCommands := []string{"l", "lines", "w", "words", "c", "bytes", "m", "chars"}
	usedFlags := utils.FlagUsed(availableCommands...)
	input := models.NewInput(usedFlags, flag.Args())
	wc.Process(input, usedFlags)
}

You can notice that I have used a seperate flag for short command and long command. Packages like Cobra does it in more elegant way but i didnt wanted to use any packages for this tool.
utils.FlagUsed is a helper function which returns what flags have been passed by the user in sys args.
Heres how it looks

func FlagUsed(s ...string) []string {
	usedFlags := []string{}

	for _, fn := range s {
		flag.Visit(func(f *flag.Flag) {
			if f.Name == fn {
				usedFlags = append(usedFlags, fn)
				return
			}
		})
	}

	return usedFlags
}

Now that we have the flags passed by the user, lets define a struct which helps us represent data in a better way

type Input struct {
	Bytes bool
	Chars bool
	Lines bool
	Words bool
	Files []string
}

The input struct will hold boolean values for different kind of types for which we need the count, and the Files in which we need to count. However, if no flag is passed by user wc by default counts the bytes, chars, and new lines, so when we initialize the Input we need to check if the user had passed any flags or not, and accordigly initialize the Input struct. Lets create a method to the Input struct to handle it.

func NewInput(usedFlags, files []string) Input {

	if len(usedFlags) == 0 {
		// 	Return Input with all as true
		return Input{true, true, true, true, files}
	}

	input := Input{}
	for _, fn := range usedFlags {
		if fn == "c" || fn == "bytes" {
			input.Bytes = true
		}

		if fn == "m" || fn == "chars" {
			input.Chars = true
		}

		if fn == "l" || fn == "lines" {
			input.Lines = true
		}

		if fn == "w" || fn == "words" {
			input.Words = true
		}
	}
	input.Files = files
	return input
}

We have what we need to count to process, our main function has a single entry points to process the input. It is defined in the

internal/wc/counter.go

func Process(input models.Input, usedFlags []string) {
	// Check if the input is coming from files or stdIn
	if len(input.Files) > 0 {
		ProcessFiles(input, usedFlags)
	} else {
		ProcessStdin(input, usedFlags)
	}

}

wc can also accept input from the STDIN if no files are passed in the input, that is why we have two different functions in the Process. Lets have a deeper dive in ProcessFiles func

func ProcessFiles(input models.Input, usedFlags []string) error {

	fileResults := []models.FileOp{}
	totalResult := models.Output{}

	for _, file := range input.Files {
		fileResult := models.FileOp{File: file}
		// wc does not give line count, but newLine count thus we need to initialize Lines to -1
		op := models.Output{Lines: -1}
		f, err := os.OpenFile(file, os.O_RDONLY, os.ModePerm)
		if err != nil {
			fmt.Println("Failed to process file ", file)
			fmt.Println("Error  ", err.Error())
			continue
		}
		defer f.Close()

		scanner := bufio.NewScanner(f)
		for scanner.Scan() {
			if err := ProcessLine(input, scanner.Bytes(), &op); err == nil {
				op.AddCount("lines", 1)
			}
		}

		if err := scanner.Err(); err != nil {
			return err
		}

		fileResult.Output = op
		fileResults = append(fileResults, fileResult)
		totalResult.AddCount("bytes", op.Bytes)
		totalResult.AddCount("chars", op.Chars)
		totalResult.AddCount("lines", op.Lines)
		totalResult.AddCount("words", op.Words)
	}

	for _, op := range fileResults {
		utils.PrintResult(usedFlags, op.Output, op.File)
	}
	if len(fileResults) > 1 {
		utils.PrintResult(usedFlags, totalResult, "total")
	}

	return nil
}

The ProcessFiles function is designed to handle word count operations for multiple files. It begins by initializing an empty slice to store results for each file and an Output structure to accumulate the total counts. The function then iterates over each file provided in the input, opening the file in read-only mode. A bufio.Scanner is used to read the file line by line, and for each line, the ProcessLine function updates the counts for lines, words, characters, and bytes. Since the tool counts newline characters instead of lines, the line count is initialized to -1. If any error occurs while reading a file, the function continues with the next file. After processing each file, the individual results are stored, and the totals are updated. Finally, the results for each file are printed, and if multiple files were processed, the combined totals are also displayed. The function returns nil if no errors occur during the scanning process.

func ProcessStdin(input models.Input, usedFlags []string) error {
	scanner := bufio.NewScanner(os.Stdin)
	output := models.Output{Lines: -1}

	for scanner.Scan() {
		if err := ProcessLine(input, scanner.Bytes(), &output); err == nil {
			output.AddCount("lines", 1)
		}
	}

	if err := scanner.Err(); err != nil {
		return err
	}
	utils.PrintResult(usedFlags, output, "")
	return nil
}

The ProcessStdin function is designed to handle word count operations when input is provided via standard input (stdin), making it useful in situations where the tool is used in a pipeline or interactively. The function begins by creating a bufio.Scanner to read the input line by line from stdin. It also initializes an Output structure with the line count set to -1 to account for the newline-based line counting. As each line is scanned, the ProcessLine function is called to update the counts for lines, words, characters, and bytes. The line count is incremented after each line is successfully processed. Once the scanning is complete, the function checks for any errors that might have occurred during the scanning process. If no errors are found, it prints the results using the PrintResult function, displaying the counts based on the flags used. The function returns any scanning errors it encounters, or nil if the operation completes successfully.

Below is a quick look for Output, this also includes a AddCount method, but we will look into it later on. The FileOp struct embeds the Output struct and also includes a File string field which will hold the file name.

type Output struct {
	Bytes int
	Chars int
	Lines int
	Words int
}

type FileOp struct {
	File   string
	Output Output
}

If you have observed carefully ProcessLine is the function which adds the counts line by line in both the file mode as well as the stdin. lets have a look into it.

func ProcessLine(ip models.Input, line []byte, op *models.Output) error {
	if ip.Bytes {
		op.AddCount("bytes", len(line))
	}

	if ip.Chars {
		op.AddCount("chars", utf8.RuneCount(line))
	}

	if ip.Words {
		// Calculate words
		wSlc := string(line)
		words := 0
		wordStarted := false
		for _, s := range wSlc {
			if string(s) != " " && !wordStarted {
				wordStarted = true
				words++
			} else if string(s) == " " && wordStarted {
				wordStarted = false
			}
		}

		op.AddCount("words", words)
	}

	return nil
}

AddCount implementation is fairly simple, it increments the count of a particular field by the passed amount n

func (o *Output) AddCount(field string, n int) {

	switch field {
	case "bytes":
		o.Bytes += n
		return

	case "chars":
		o.Chars += n
		return

	case "lines":
		o.Lines += n
		return

	case "words":
		o.Words += n
		return

	}

}

Great! till now we have managed to get the counts, but we need to make sure that the way we print/display our data is similiar to wc.

utils/utils.go

func PrintFilesResult(flags []string, o []models.FileOp) {
	for _, fn := range o {
		PrintResult(flags, fn.Output, fn.File)
	}
}

PrintFilesResult will iterate the files and call the PrintResult function

func PrintResult(flags []string, o models.Output, fn string) {
	if len(flags) == 0 {
		fmt.Printf("%d %d %d %s\n", o.Lines, o.Words, o.Bytes, fn)
		return
	}

	op := ""
	for _, f := range flags {
		if f == "l" || f == "lines" {
			op = op + fmt.Sprintf("%d ", o.Lines)
		}

		if f == "w" || f == "words" {
			op = op + fmt.Sprintf("%d ", o.Words)
		}

		if f == "c" || f == "bytes" {
			op = op + fmt.Sprintf("%d ", o.Bytes)
		}

		if f == "m" || f == "chars" {
			op = op + fmt.Sprintf("%d ", o.Chars)
		}

	}

	op = op + fn
	fmt.Println(op)

}

If the user havent passed any flags, the default output needs to be printed. we then check for the flags which have been passed and create the final output string.

Running the code
#

Lets build the binary

go build -o ./bin/wc_win_x64 cmd/main.go

Below is the content of file.txt

hello world
lets go bro

we can also run it like

PS C:\Users\rohan\Desktop\personal\wc> go run .\cmd\main.go file.txt
1 5 22 file.txt
PS C:\Users\rohan\Desktop\personal\wc>

As expected, when we dont pass any flags the default output is printing new lines, number of chars and bytes

Lets pass some flags and test it

PS C:\Users\rohan\Desktop\personal\wc> go run .\cmd\main.go -l -w file.txt
1 5 file.txt
PS C:\Users\rohan\Desktop\personal\wc>

Room for improvement
#

This is a basic implementation, and theres still room for improvement. we could implement go routines etc to make it even faster and would give significant performance benefits when processing large files.

Give me the code
#

If you have followed along till here, you definitely want to have a look at the code. Below is the github repo for this project.

https://github.com/rohanchavan1918/wc
#

Thanks and regards!
Rohan