Laboratory 5: Enumerable, LINQ and Extension Methods #
Starter code #
Student
- output.txt
- student.sln
- tasks/
- tests/
Extension methods for existing types #
Identifier naming rules and conventions
One of the fundamentals of writing readable and consistent code is sticking to conventions. Different programming languages have their own preferences for naming identifiers — variables, methods, properties, or classes.
Common naming styles
PascalCase:
- The first letter of each word is capitalized, without separators (e.g.,
UserProfileId,HttpRequestHeaders).- Commonly used in: C#, Java (for class, method, and property names).
camelCase:
- The first letter of the first word is lowercase and subsequent words start with an uppercase letter (e.g.,
startDateTime,xmlParserSettings).- Popular in: JavaScript (for variables and functions), Java (for variables and methods).
snake_case:
- All letters are lowercase and words are separated by underscores (e.g.,
user_profile_id,json_response_data).- Often used in: Python, Rust, databases, or as JSON field names.
kebab-case:
- Words are separated by hyphens (e.g.,
html-element-id).- Commonly encountered in: URLs, HTML attributes, filenames (e.g., in JavaScript/Node.js projects).
In projects that span multiple technology layers (e.g., C# backend, JavaScript frontend, communication via JSON), you often need to convert names between styles. For example, a C# property
EmailAddressmay be serialized to JSON asemail_address.What you’ll learn
- How to create and use extension methods for the
stringtype inC#.- How to implement conversions between
PascalCaseandsnake_casenaming styles.
Task description #
Your task is to implement two extension methods for the string class:
PascalToSnakeCase, which for a valid identifier inPascalCasereturns itssnake_caseequivalent.SnakeToPascalCase, which for a valid identifier insnake_casereturns itsPascalCaseequivalent.
Example usage:
var pascal = "HtmlElementId";
var snake = pascal.PascalToSnakeCase();
Console.WriteLine(snake); // "html_element_id"Helpful resources:
Example solution #
An example solution can be found in StringExtensions.cs.
Unit tests are available in StringExtensionsTests.cs.
Iterators, yield, and generating prime numbers
#
Iterators in practice
Often we need to produce sequences whose length is unknown in advance or whose creation would be too expensive to build all at once. Instead of constructing the whole collection in memory and returning it, we can use iterators — a mechanism that yields elements on demand.
In C#, iterators are created using two keywords:
yield return— returns a single element and suspends the method’s execution until the next element is requested.
yield break— immediately stops the iterator and ends the sequence.The advantage is that you don’t need to implement
IEnumeratoryourself; the compiler generates the state machine for you. The iterator preserves its state between calls, which results in more concise and readable code.Generating prime numbers
The Sieve of Eratosthenes is a classic algorithm to compute all prime numbers less than a given number
n, i.e., in the range[2, n]. The algorithm works by eliminating composite numbers.What you’ll learn
- Creating iterators in C# using
yield returnandyield break.- Preserving method state between iterator invocations.
- Memory and performance optimizations when processing large datasets.
Task description #
Implement the following method:
public static class PrimeFinder
{
public static IEnumerable<int> SieveOfEratosthenes(int upperBound);
}The method should use yield return and yield break to generate prime numbers on demand.
Example usage:
foreach (var prime in SieveOfEratosthenes(1000))
{
if (prime > 850) break;
Console.WriteLine(prime);
}Helpful resources:
Example solution #
An example solution can be found in PrimeFinder.cs.
Unit tests are available in PrimeFinderTests.cs.
Implementing the IEnumerable<T> interface
#
Array-based tree representation
Binary trees are a fundamental data structure for storing hierarchical data. While they are often implemented using pointers (or references) to child nodes, there is an alternative and efficient approach: an array representation.
In this model the root is stored at index
0of an array. For any parent node at indexi:
- Its left child is at index
2 * i + 1.- Its right child is at index
2 * i + 2.This approach eliminates the need to store references in each node, saving memory. The challenge is dynamically managing the array size when the tree grows and you need to add an element at an index beyond the current bounds.
Example
graph TB A0["10 (index 0)"] A1["5 (index 1)"] A2["20 (index 2)"] A3["3 (index 3)"] A4["7 (index 4)"] A0 -->|left| A1 A0 -->|right| A2 A1 -->|left| A3 A1 -->|right| A4 style A0 fill:#c6e2ff,stroke:#4682b4 style A1 fill:#c6e2ff,stroke:#4682b4 style A2 fill:#c6e2ff,stroke:#4682b4 style A3 fill:#c6e2ff,stroke:#4682b4 style A4 fill:#c6e2ff,stroke:#4682b4Traversal (
IEnumerable)To make the tree class useful, it should provide a way to iterate over its elements. The .NET standard is to implement
IEnumerable<T>. For a binary tree, anIn-Ordertraversal is common — for a BST it yields elements in sorted order. Usingyield returnallows implementing this traversal recursively in an elegant way.What you’ll learn
- Implementing a binary tree structure using a flat array.
- Mapping parent-child relationships to array indices.
- Dynamically resizing the array (
Array.Resize) when needed.- Implementing
IEnumerable<T>for a custom collection.- Creating a recursive
In-Orderiterator usingyield return.
Task description #
Implement the ArrayBinaryTree<T> class that implements the IArrayBinaryTree<T> interface. This class should represent a binary tree of integers using an internal array.
Implement the following interface:
public interface IArrayBinaryTree<T> : IEnumerable<T>
{
int Count { get; }
int RootIndex { get; }
void SetRoot(T value);
(int leftIndex, int rightIndex) GetChildrenIndices(int parentIndex);
void SetLeftChild(int parentIndex, T value);
void SetRightChild(int parentIndex, T value);
T this[int index] { get; }
bool Exists(int index);
void Clear();
}The implementing class should maintain correct Count logic, handle exceptions (e.g., adding a child to a non-existent parent), and dynamically resize the array when a child index exceeds the current array bounds.
Example usage:
var tree = new ArrayBinaryTree<int>();
tree.SetRoot(10);
tree.SetLeftChild(0, 5);
tree.SetRightChild(0, 20);
tree.SetLeftChild(1, 3);
tree.SetRightChild(1, 7);
// Expected output (In-Order traversal): 3, 5, 7, 10, 20
foreach (var value in tree)
{
Console.WriteLine(value);
}Helpful resources:
Example solution #
An example solution can be found in ArrayBinaryTree.cs.
Unit tests are available in ArrayBinaryTreeTests.cs.
IEnumerable, generics and LINQ
#
LINQ in practice
In C#, LINQ (Language Integrated Query) allows you to express queries and operations on sequences (
IEnumerable<T>) in a concise, declarative manner. With extension methods you can add your own operators toIEnumerable<T>, extending LINQ with commonly repeated processing patterns such as:
Fold(seed, func, resultSelector)— generalizes aggregation (reduction), keeping an accumulator across elements and returning a final result.Batch(size)— splits a sequence into successive batches of a givensize, useful for batch uploads or processing.SlidingWindow(size)— produces overlapping sliding windows of a specifiedsize, useful for trend detection.Thanks to LINQ’s lazy evaluation and extension methods, processing can be both readable and efficient — elements are produced and filtered only when needed.
What you’ll learn
- Creating generic LINQ-style extension methods.
- Working with an explicit enumerator object (
MoveNext,Current).- Solving practical data-processing problems using the implemented operators.
Task description #
The task is split into three parts. Each part starts with implementing an operation that resembles existing LINQ methods, and then solving practical problems using that operation.
Fold
Implement a generic extension method Fold for any sequence (IEnumerable<T>) which:
- Accepts an initial accumulator value
seed(possibly of a different type than the sequence elements). - Invokes a provided accumulator function for each element, updating the accumulator state.
- After iterating the whole sequence, invokes a result selector function to transform the final accumulator state into the method result.
The method should return the result produced by the final function. Implement all steps manually (initializing the accumulator, iterating elements, computing the final result) using an explicit enumerator object.
Challenges
Computing statistics for an integer collection
Implement an extension method for integer sequences with the following requirements:
- The method is named
ComputeStatistics. - If the sequence is
nullor contains no elements, it should throwArgumentExceptionwith the messageSource sequence must contain at least one element. - In a single pass (using the previously implemented
Fold) compute:- minimum value,
- maximum value,
- arithmetic mean,
- standard deviation.
- Return a tuple
(min, max, average, standardDeviation).
Example usage:
var source = new[] { 2, 5, 3, 9, 4 };
var (min, max, average, std) = source.ComputeStatistics();
Console.WriteLine($"Min = {min}"); // 2
Console.WriteLine($"Max = {max}"); // 9
Console.WriteLine($"Average = {average:F2}"); // 4.60
Console.WriteLine($"StdDev = {std:F2}"); // 2.42Finding the longest run of identical elements
Implement an extension method for integer sequences with the following requirements:
- The method is named
LongestSequence. - If the sequence is
nullor empty, throwArgumentExceptionwith the messageSource sequence must contain at least one element. - In a single pass (using
Fold) find the longest contiguous subsequence of identical values and return:start— starting index of that subsequence,end— ending index (inclusive),value— the repeated value.
- Indices are zero-based and refer to the original sequence.
Example usage:
var source = new[] { 1, 1, 2, 2, 2, 3, 3, 2 };
// 0 1 2 3 4 5 6 7
var (start, end, value) = source.LongestSequence();
Console.WriteLine($"Start = {start}"); // 2
Console.WriteLine($"End = {end}"); // 4
Console.WriteLine($"Value = {value}"); // 2Batch
Implement a generic extension method Batch for any sequence (IEnumerable<T>) which:
- Splits the input sequence into successive batches with a maximum size
size, returning them lazily asIEnumerable<IEnumerable<T>>. - The last batch may be smaller if the item count is not a multiple of
size. - The
sizeshould be>= 1(otherwiseArgumentOutOfRangeException(nameof(size), "Batch size must be at least 1.")is being thrown).
Use an explicit enumerator in the implementation.
SlidingWindow
Implement a generic extension method SlidingWindow for any sequence (IEnumerable<T>) which:
- Returns successive overlapping windows of fixed size
size. - If
sizeis less than 1, throwArgumentExceptionwith messageWindow size must be at least 1. - Windows slide forward by one element, so for
[a,b,c,d]andsize=3it yields[a,b,c]then[b,c,d].
Example usage:
var source = Enumerable.Range(1, 5); // {1,2,3,4,5}
foreach (var window in source.SlidingWindow(3))
{
Console.WriteLine($"[{string.Join(", ", window)}]");
}
/* Expected output
[1, 2, 3]
[2, 3, 4]
[3, 4, 5]
*/Challenges
Windows with rising sum
Implement FindSlidingWindowsWithRisingSum which returns (as IEnumerable<IEnumerable<int>>) all windows of length 5 whose sum is greater than the sum of the immediately preceding window.
Example:
Given the sequence:
var sequence = new [] { 5, 3, 1, 2, 4, 2, 10, -1, 2, 4, 7, -3 }The table below analyzes consecutive windows and their sums:
| Window | Elements | Sum | Returned? |
|---|---|---|---|
| 1 | [5, 3, 1, 2, 4] | 15 | ❌ |
| 2 | [3, 1, 2, 4, 2] | 12 | ❌ |
| 3 | [1, 2, 4, 2, 10] | 19 | ✅ |
| 4 | [2, 4, 2, 10, -1] | 17 | ❌ |
| 5 | [4, 2, 10, -1, 2] | 17 | ❌ |
| 6 | [2, 10, -1, 2, 4] | 17 | ❌ |
| 7 | [10, -1, 2, 4, 7] | 22 | ✅ |
| 8 | [-1, 2, 4, 7, -3] | 9 | ❌ |
the returned collection is:
[
[ 1, 2, 4, 2, 10 ],
[ 10, -1, 2, 4, 7 ]
]Windows with duplicates
Implement FindSlidingWindowsWithDuplicates which returns (as IEnumerable<IEnumerable<int>>) all windows of length 4 that contain at least one duplicated number.
Example:
Given the sequence:
var sequence = new[] { 1, 2, 3, 4, 2, 5, 6, 2, 7, 8 }The table below analyzes consecutive windows and duplicate presence:
| Window | Elements | Duplicates? | Returned? |
|---|---|---|---|
| 1 | [1, 2, 3, 4] | none | ❌ |
| 2 | [2, 3, 4, 2] | 2 | ✅ |
| 3 | [3, 4, 2, 5] | none | ❌ |
| 4 | [4, 2, 5, 6] | none | ❌ |
| 5 | [2, 5, 6, 2] | 2 | ✅ |
| 6 | [5, 6, 2, 7] | none | ❌ |
| 7 | [6, 2, 7, 8] | none | ❌ |
the returned collection is:
[
[ 2, 3, 4, 2 ],
[ 2, 5, 6, 2 ]
]Most common trigrams in text
Implement FindMostCommonTrigrams which finds all most-frequently-occurring 3-letter sequences (trigrams) in the input text.
Assumptions:
- A trigram is any three consecutive letters in the text (non-letter characters are ignored).
- Letter case is ignored (
ABCandabcare the same trigram). - Return an
IEnumerable<string>with all trigrams that occur most frequently (there may be more than one if they have the same count). - If the text contains fewer than 3 letters, return an empty sequence.
Example:
For the text Anna and Antek are analyzing an annual analysis. consider the character sequence: annaandantekareanalyzinganannualanalysis.
The table below lists trigrams that appear more than once:
| Trigram | Count | Returned? |
|---|---|---|
"ana" | 3 | ✅ |
"aly" | 2 | ❌ |
"ann" | 2 | ❌ |
"nal" | 2 | ❌ |
the returned collection is:
[
"ana",
]Example solution #
An example solution can be found in EnumerableExtensions.cs.
Unit tests are available in EnumerableExtensionsTests.cs.
LINQ and movie data analysis #
What is a relational database?
A relational database stores information in tables that are connected by relationships. A typical table has a primary key — a unique identifier for each record (row).
Tables can also include foreign keys, which reference identifiers from other tables.
Joins (
JOIN)To get a fuller picture — for example, “Who acted in which movie?” or “What are the average ratings for fantasy films?” — we often combine data from multiple tables using
JOINoperations.Two joins that are especially relevant here are:
INNER JOIN, which combines two sets but only when matching rows exist in both tables.LEFT JOIN, which returns all rows from the left table even if there is no matching row in the right table.What you’ll learn
- Using LINQ queries to join, filter, and group data.
- Aggregating and sorting collections and extracting statistics (e.g., average ratings, counts).
- Working with related collections (
movies,actors,casts,ratings) usingJoin,GroupJoin, andSelectMany.- Creating nested result structures (for example, movies with their casts).
- Optimizing queries by filtering and projecting only the needed data.
Task description #
The sample in-memory movie database is located in SampleMovieDatabase.cs. It stores collections of records for the different entities (List<Movie>, List<Actor>, etc.).
The data models represent the following records:
public record Movie(
int Id, // primary key
string Title,
int Year,
[property: JsonConverter(typeof(JsonStringEnumConverter))]
Genre Genre,
int DurationMinutes
);
public record Actor(
int Id, // primary key
string Name
);
public record Rating(
int Id, // primary key
int MovieId, // foreign key
int Score,
DateTime CreatedAt
);
public record Cast( // association table
int MovieId, // foreign key
int ActorId, // foreign key
string Role
);Implement LINQ queries in DatabaseQueries.cs to analyze the movie dataset.
To display results we will use a simple helper that serializes the query object to JSON:
public static void DisplayQueryResults<T>(T query)
{
var options = new JsonSerializerOptions
{
WriteIndented = true
};
options.Converters.Add(new JsonStringEnumConverter());
var json = JsonSerializer.Serialize(query, options);
Console.WriteLine(json);
}Queries
Query 1: Actors from Fantasy movies
Find unique actors who acted in fantasy movies.
Query 2: Longest movie in each genre
For each genre, find the movie with the greatest duration.
Query 3: Movies with average rating above 8 and their cast
List movies whose average rating exceeds 8, together with the list of actors who appeared in those movies.
Query 4: Number of distinct roles per actor
For each actor, compute how many distinct roles they played and sort descending by that count.
Query 5: Movies released in the last 5 years with average rating
Show the most recent movies with their average ratings, sorted descending by the average.
Query 6: Average rating per genre
Compute and display the average movie rating for each genre.
Query 7: Actors who never acted in a Thriller
Find actors who never appeared in any movie with the `Thriller` genre.
Query 8: Top 3 movies by number of ratings
Show the three movies that received the most ratings.
Query 9: Movies without any ratings
List movies that have no ratings assigned.
Query 10: Most versatile actors
Find actors who performed in the largest number of different movie genres.
Implementation notes
- For queries 2, 4, and 6 the result may differ depending on the type of join used. As an exercise, try both
INNER JOINandLEFT JOINand compare results.- Both approaches are acceptable for this exercise as long as you understand why results differ 😉.
- For query 2 there might be genres with no movies in the dataset.
- In query 3, some actors may not have played in any movie present in the database.
- For query 6 some genres may have no ratings.
- For simplicity, the tasks do not mandate exact columns to return from each record.
Example solution #
An example solution is available in DatabaseQueries.cs. This code is executed from Program.cs.
Solution for download #
Solution
- solution.sln
- tasks/
- tests/