Skip to content

Indexing

Indexing is the use of the df[..., ...] syntax to extract values from the data frame. Two arguments are always required, the row index followed by the column index. The returned type depends on which arguments are scalars or slices. This operator ignores any group levels that may be present.

df[row_index: int, column_index: str]

Indexing with a scalar row index and a scalar column index returns a scalar value. The returned type is a Python type, bool, int, float, or str.

from tabeline import DataFrame

df = DataFrame(
    book=["The Hobbit", "The Fellowship of the Ring", "The Two Towers", "The Return of the King"],
    year=[1937, 1954, 1954, 1955],
    word_count=[95356, 187790, 156198, 137115],
)

assert df[2, "book"] == "The Two Towers"

df[row_index: int, column_index: slice | Sequence[str]]

If only the row index is a scalar, indexing returns a tabeline.Record. Functionally, a Record is an ordered dict.

A Record can be iterated over or indexed further. Each key is a string, and each value is a Python type, bool, int, float, str, or None. The underlying data types are lost when a Record is created.

A slice(None) (i.e. :, all columns) is the only slice allowed. Sliced str ranges are not supported. To select a subset of columns, list them as a sequence of strings.

from tabeline import DataFrame

df = DataFrame(
    book=["The Hobbit", "The Fellowship of the Ring", "The Two Towers", "The Return of the King"],
    year=[1937, 1954, 1954, 1955],
    word_count=[95356, 187790, 156198, 137115],
)

assert df[2, :] == Record(book="The Two Towers", year=1954, word_count=156198)
assert df[2, :]["book"] == "The Two Towers"

df[row_index: slice | Sequence[int], column_index: str]

If only the column index is a scalar, indexing returns a tabeline.Array. Functionally, an Array is a list.

Normal slices are permitted on the row index and behave as expected. Selecting specific rows with a sequennce of integers is also allowed.

from tabeline import DataFrame

df = DataFrame(
    book=["The Hobbit", "The Fellowship of the Ring", "The Two Towers", "The Return of the King"],
    year=[1937, 1954, 1954, 1955],
    word_count=[95356, 187790, 156198, 137115],
)

assert df[:, "word_count"] == Array(95356, 187790, 156198, 137115)
assert df[:, "word_count"][2] == 156198

Array

Each Array has a given data type that restricts the possible values that the elements may have. The data type can be accessed via the Array.data_type attribute, which returns a member of the tabeline.DataType enum. All the elements in a given Array are instances of that type or are null. Tabeline currently has no scalar values so getting a lone instance of the data type in Python is not possible. Extracting a single element from an Array or DataFrame returns an object with one of these basic Python types: bool, int, float, str, or None.

The possible data types are listed below. The in column in the table below indicate which Python type is converted to that data type. The out column indicates with Python type is returned when extracting a value from that data type.

DataType in out
Boolean bool bool
Integer8 int
Integer16 int
Integer32 int
Integer64 int int
Whole8 int
Whole16 int
Whole32 int
Whole64 int
Float32 float
Float64 float float
String str str
Nothing

df[row_index: slice | Sequence[int], column_index: slice | Sequence[str]]

Slicing both the row index and the column index returns another tabeline.DataFrame.

The resulting DataFrame has no group levels regardless of the parent.

from tabeline import DataFrame

df = DataFrame(
    book=["The Hobbit", "The Fellowship of the Ring", "The Two Towers", "The Return of the King"],
    year=[1937, 1954, 1954, 1955],
    word_count=[95356, 187790, 156198, 137115],
)

assert df[1:3, ["book", "year"]] == DataFrame(
    book=["The Fellowship of the Ring", "The Two Towers"],
    year=[1954, 1954],
)