Home Artificial Intelligence Make Your Tabular Data Stand Out via CLI With These Suggestions and Tricks Creating tabular data The issue and its solution Conclusions

Make Your Tabular Data Stand Out via CLI With These Suggestions and Tricks Creating tabular data The issue and its solution Conclusions

0
Make Your Tabular Data Stand Out via CLI With These Suggestions and Tricks
Creating tabular data
The issue and its solution
Conclusions

Image by Dorothe on Pixabay

A couple of days ago I desired to help my father solve an issue. His need was to aggregate, filter, and display some data as fast as possible. Well…the reality is that he printed the info (something like 10 pages every time!!) and search the info by hand! I saw his difficulties and decided to assist him immediately.

Nothing as difficult for somebody who can analyze data as I’m: the info was already in Excel format, so a Jupyter Notebook and Pandas were the right decisions.

The issue is that I don’t work for my father. Also, we leave in numerous cities and see one another only every couple of weeks. So, I needed to offer him a tool he could use with the next characteristics:

  • Easy usage.
  • Don’t make the PC explodes if he writes something fallacious.

This is the reason I believed to create a small program that could possibly be managed via CLI. What I desired to create was easy: the user writes something via the command line. Then, this system shows the user all the info associated as Pandas would do, but within the terminal.

So, in this text, I’ll show you ways we are able to display tabular data via CLI (Command Line Interface. That’s: the terminal, in case you didn’t know). We’ll create a straightforward project to get you immediately hands-on Python and discuss the libraries I used.

So, to start with, let’s create some tabular data only for the sake of the exercise:

import pandas as pd

# Create data

data = {"fruit":["banana", "apple", "pear", "orange"],
"color":["yellow", "red", "green", "orange"],
"weight(kg)":[0.3, 0.1, 0.1, 0.2]

}

# Transform data to data frame
df = pd.DataFrame(data)

So, these are our data:

The tabular data we created. Image by Creator.

We’ve got created some tabular data that contain information on some fruits, particularly: the name of the fruit, the colour, and the burden in kilograms.

Now, to make it “more real”, we are able to reserve it into an Excel file like so:

# Save data frame to xlsx file
df.to_excel("fruit.xlsx")
NOTE:
This technique of saving files that Pandas gives us could be very useful.
For instance, we are able to use it to convert CSV files into XLSX; We did it
in this text here.

Now, we’ll create a straightforward filter that doesn’t need Python if we are able to use Excel a bit bit. The issue I faced was more complicated, but here we’re creating it simply on purpose: our scope shouldn’t be to point out that this method is best than one other. Here we’re showing how we are able to display tabular data via CLI, and a straightforward example will do the job.

So, let’s say that is our problem: we wish the user to put in writing the name of a fruit and our program returns all of the features of the fruit chosen. We also want the filter to be one way or the other “intelligent” in order that if the user writes “pea” it would display the features related to “pear”.

To achieve this, in Pandas we are able to use the tactic str.comprises(). Let’s try it in our Jupyter Notebook:

import pandas as pd

# Import data
df = pd.read_excel("fruit.xlsx")

# Filter for pear
data_frame = df[df["fruit"].str.comprises("pea")]

# Show filtered data
data_frame.head()

And we get:

The filtered data. Image by Creator.

Read it rigorously: we wrote “pea” as a typo on purpose to get sure Pandas returns the info anyway. And it does, as expected.

So, now we’ve got to face one other problem: the intervention of the user via CLI. So far as I do know, we are able to use two different methods in these cases: we are able to use the input built-in function or we are able to use the library argparse.

In case you missed it, I’ve written an article on how we are able to use argparse in Data Science. Test it out here:

Now, on this case, I made a decision to make use of the input built-in function because I imagine it’s easier to make use of, and in easy cases like these is a really sensible choice. In actual fact, that is the right alternative if we just must pass a string as an argument via CLI (you’ll be able to read the documentation here).

We are able to use the input function like so:

# User input
fruit = input("filter the info for the type of fruit: ")

Now, let’s see how this works and the way it returns the info. That is the code we are able to use:

import pandas as pd

# User input
fruit = input("filter the info for the type of fruit: ")

# Import data
df = pd.read_excel("fruit.xlsx")

# Filter for user input
data_frame = df[df["fruit"].str.comprises(fruit)]

# Print results
print(data_frame)

NOTE:
have a look at the difference of how we have pasted the arguments in the tactic
str.comprises(). Aboved we have passed "pea" with quotes because we were
searching directly for a string.
On this case, insetead, we've got passed "fruit" without quotes because
we've got used "fruit" as a variable to invoke the input() function so it
needs to be passed as is (with no quotes).

Now, let’s reserve it as fruit.py, move it to the folder where fruit.xlsx is positioned and let’s run it via the terminal:

Our code via CLI. GIF by Creator.

Well, as we are able to see, every little thing works wonderful. But only one thing: can we improve the visualization? What if we’d prefer to display higher the info as if we were on Pandas?

Well, the answer I discovered was to make use of the library tabulate (here’s the documentation).

So, let’s add tabulate to our code and see what happens:

import pandas as pd
from tabulate import tabulate

# User input
fruit = input("filter the info for the type of fruit: ")

# Import data
df = pd.read_excel("fruit.xlsx")

# Filter for user input
data_frame = df[df["fruit"].str.comprises(fruit)]

# Print results
print(tabulate(data_frame, headers='keys', tablefmt='psql'))

And we get:

Our code via CLI. Image by Creator.

As we are able to see, the info are displayed in a “tabular way”, which is cleaner. Also, as we are able to see, the code appropriately manages typos if we seek for “pea” as we did in Jupyter before.

I hope this helps you if you should display tabular data via CLI. If you’ve some other suggestions, please, let me know within the comments: I’m at all times open to improving and learning something recent.

LEAVE A REPLY

Please enter your comment!
Please enter your name here