String Data Structure in Python | Complete Case study

Raghav Agrawal 30 Nov, 2021 • 8 min read

This article was published as a part of the Data Science Blogathon.

The string is one standard Data type in Python. You will find string data type in every application programming language like Java, Python, C++ because while developing an application you need to talk to the user and it is done in strings. Whereas if we talk about system software programming languages like C then it may happen that string data type is not present because their work is to make system software that talks with hardware not with users so C has character but not strings. Strings in Python are a very important data type and you will realize while working on real-world projects and Most of the problem statements in interviews and written tests are based on strings because dealing with strings seems easy but it is not too easy to solve problems. In this article, we will cover the A-Z of strings.

python string data structure complete guide

Table of contents

  1. A brief overview on string Data Type
  2. Accessing substrings from a string
    • Indexing
    • Slicing
  3. Editing and deleting string in Python
  4. operations on string
  5. string functions in python

Brief Overview of Python String Data Type

In Python specifically, Strings are a sequence of Unicode characters. a string is a text or language that we use to converse. But machines are not capable to understand text, it only knows binary. So first characters are converted to numbers using ASCII and then to binary representation. But ASCII is only for the English language and when programming languages find their place in different countries then it is converted to 16 bit which is known as Unicode characters.

Creating Strings

There are different ways in python to create strings like in single, double, as well as in triple quotes. while creating a string if a string has any type of quotes then in declaration you should use a different type of quote to avoid an error. for better understanding try the below example in any python IDE or code editor.

s = 'Hello'  #single quotes
s = 'It's raining outside'  #syntax error(It confuses where string start and end)
s = "It's raining outside"  #correct way
s = '''Good Morning'''  #multi-line strings

Triple quotes are used when you are declaring or accessing a large paragraph. There is an in-built function for declaring or converting any other data type to string.

a = 77  #int
print(str(a)) #int to str

Accessing Substring from String

Accessing parts of a string is sometimes necessary to solve some particular use case. Under this heading, we will cover Indexing and slicing which help to get a particular part of the string as output.

Indexing

Indexing is the same in a string as in an array means zero-based indexing. In simple words, the zero indexes represent the first character in a string and as the index increases, it moves forward. In python, indexing is classified as positive and negative indexing

1) Positive Indexing

Positive indexing is the same as we study zero-based indexing means moving from zero to the end length of a string.

s = "Hello"
print(s[0])  #OUTPUT -> H
print(s[5])  # ERROR (index out of range)
print(s[4])  # O/P -> o
print(s[len(s)-1]) #O/P -> o

2) Negative Indexing

Negative indexing means accessing the elements of string from backward. If we want to access the last character so string them in positive indexing it is a long method as we have to find the length of the string and subtract one from it instead we can directly use the negative one in this case.

s = "Hello"
print(s[-1])  #O/P -> o

Slicing

when we have long strings like sentences or paragraphs and want to access some parts like complete words then we cannot use only indexing so here lies the concept of slicing which is achieved using the colon operator. In the square bracket, we need to define starting index and end index separated with a colon so the output will be part of the string from a start index to end index minus one.

s = "Hello World"
print(s[1 :  4])  #O/P -> ell

In the above example start from one index and retrieve the character till the third index. Now there are different variations in slicing so let us look at them practically through the below code snippet.

s = "Hello World"
print(s[2:]) #from second index give cmplt string (O/P -> llo World)
print(s[:3]) #from starting till third index (O/P -> Hel)
print(s[:])  #give cmplt string
print(s[0:8:3]) #(O/P -> HlW) 
#in above expression third value represent step means start from zero index and skip two index and take third index till eighth index
print(s[0:6:-1]) #Nothing (while working with +ve index you cannot have -ve step)
print(s[-5 : -1 : 2]) #from -ve five to -ve one and step of 2 (O/P-> Wr)
print(s[::-1])  #reverse string (from start to end in reverse)
print(s[-1:-5:-1]) #reverse string from -1 index to -5 index (O/P -> ldro(reverse of orld))

Editing and Deleting Python Strings

String’s are immutable data types in Python and if you try to edit any string then it will throw a type error. It means you cannot edit or add a new character to the existing string But you can reassign it to a different variable means to create a new string. let us look at deleting and editing string practically and see does it happen or we get an error.

s = "Hello World"
s[0] = "M" #does not support Item assignment
a = "M" + s[1:] #It will work to reassign 
print(a) #O/P -> Mello World
#Try deleting
del s[0]    #error
del s[:3:2] #cannot del portion of string
del(s) #It will run successfully but internally it not fully dlt it

So It is very important to understand that strings are immutable and you cannot change anything in the existing string.

Operations on Python String

Arithmetic Operations

There are many arithmetic operators like (+, -, *, /, %) but on strings, we can perform only two operators as an addition(+) and multiplication(*).

1) Addition(+)

It is also known as concatenation means we can join any number of strings using an addition operator between them

s = "Hello" + "-" + "World"
print(s)  #O/P -> Hello-World

2) Multiplication(*)

It is also known as string repetition means to repeat a particular string before the multiplication operator any number of times. It is used when we want some pattern.

s="Hello"
print(s*5)  #O/P -> HelloHelloHelloHelloHello

Relational Operations

Relational Operators means exhibiting the relation between two terms and checking whether the given condition is True or False so the output of the relational operator is boolean.

print("Hello" == "World")  #False
print("Hello" != "World")  #True

The thing becomes more interesting when we use greater than and less than operators.

print("Mumbai" > "Pune")  #False
print("Goa" < "Indore")  #True

Now you might be thinking about how it is comparing two strings. so It compares a string lexicographically. In the first example, character P comes after M in alphabetical series so the answer is False. sometimes people also get confused in lowercase and uppercase so lowercase characters come after uppercase.

Logical Operations

Whenever you perform logical operators (AND, OR, NOT) on strings then Python says False to empty strings and True to non-empty strings.

print("Hello" and "World") #True and True -> True (O/P -> World)
print("" and "World") #False and True -> False (O/P -> "")
print("" or "World")  #False OR True -> True (O/P -> "World")
print(not "Hello")  #opposite of True -> False
print(not "")  #True

Loops

We can loop over string from any index using slicing. we can use where as well as for loop on a string to access different characters or to form a pattern.

s = "Hello world"
for i in s:
    print(i)
#second way
for i in s[2:7]:
    print(i)
#third way
i=0
while i<len(s):
    print(i)

Membership Operators

Membership operators in Python are in and NOT in which is used to check any element is present in any sequence data structure of Python.

s = "Hello world"
if 'o' in s:
    print("True")

Python String Functions

Now the most important part of a string is string functions which you used everywhere whenever you built any project. Now we will study all-important string functions which are frequently used.

Common Functions

These are functions that you get on all other iterator data types like list, tuple, set, and also available in string.

1) length – It gives you the length of the string.

2) minimum – According to ASCII it gives you the smallest character present in a string.

3) maximum – According to ASCII it gives you the largest character present in a string.

4) sorted – It sorts the string in ascending or descending order according to ASCII sequence. The output of this function is always in form of a list of characters.

s = "Hello"
print(len(s)) #5
print(min(s)) #H
print(max(s)) #o
print(sorted(s)) #['H', 'e', 'l', 'l', 'o']
print(sorted(s, reverse=True)) #['o', 'l', 'l', 'e', 'H']

Specific Functions

Now the functions we will study are only applicable to strings.

1) Capitalize / title

Capitalize function converts the first letter of a string in uppercase while the title function converts each word’s first character in uppercase.

s = "its raining outside"
print(s.capitalize())  #O/P -> Its raining outside
print(s.title())   #O/P -> Its Raining Outside

2) upper / lower / swap case

The upper function converts each character of string in the upper function and lowers in the lowercase. while swap case converts a lower character to upper and vice-versa.

s = "Its Raining Outside"
print(s.upper())         #O/P -> ITS RAINING OUTSIDE
print(s.lower())         #O/P -> its raining outside
print(s.swapcase())  #O/P -> iTS rAINING oUTSIDE

3) count

It gives you the count of any substring present in a string. If the substring is not present in the string then it outputs zero. It is used to find the frequency of substring in a string.

s = "Its Raining Outside"
print(s.count("i")) #3
print(s.count("ing")) #1

4) Find / Index

Both the functions work exactly in a similar way. both the function finds the index of substring present in a string. the only difference between both functions is when the substring is not present in string find returns negative one while index throws an error.

s = "Its Raining Outside"
print(s.find("ing")) #8
print(s.index("ing")) #8
print(s.find("down")) #-1
print(s.index("down")) #error

5) Ends with / Starts with

It is used to check whether the string is starting or ending with a particular character or not.

s = "Its Raining Outside"
print(s.startswith("I")) #True
print(s.endswith("f")) #False

6) Format

This function is very much used everywhere and when you make projects then to display anything between strings you use the format function. for example, if you are developing a login page then after login you display a user name like hello username and you did not know who is going to use it so you can use the format function.

print("Hello my name is {} and i live in {}".format("Rahul","Indore"))
print("your height is {height} feet and weight is {weight} kg".format(weight=57, height=5.8))

7) isalnum / isalpha

These are the validators that are used for validating some input. the first functions check that all the characters in a string are alphabet and numeric. the second function checks that all the characters in a string are alphabets.

print("hello20".isalnum()) #True
print("hjuGbh".isalpha()) #True
print("hello2".isalpha()) #False

8) isdigit

It checks whether the input string is a digit or not.

9) split function

It split the string into a list of elements. The default splitting is done on the basis of string while you can also do splitting on any other element.

sent = "I am playing football"
print(sent.split())      # ['I', 'am', 'playing', 'football']
sent2 = "we.sitting.together"
print(sent2.split("."))   # ['we', 'sitting', 'together']

10) Join

The join function helps you to join the list of strings in a form of a string with any separator. To form the URL while the routing join function is used.

sent = ['I', 'am', 'playing', 'football']
print(" ".join(sent))
print("-".join(sent))

11) Replace

It is used to replace any part of a string with another string. The function is mostly used for text cleaning purposes to replace unwanted strings with an empty strings.

s = "hello20"
print(s.replace("20", "")) #hello

12) Strip

The strip function is used to remove trailing and leading spaces in a string. It is used while saving data in databases because it happens that most of the time users by mistake type empty spaces.

s = "   hello   "
print(s.strip()) #hello

Conclusion

Here we are completed with an understanding of string and I hope that it was a nice experience following this article. Most machine learning practitioners are familiar with many things we have gone through but I hope that you get to know something new about string and perspective of the use of string is changed. We have started by getting ideas about string and covered how to use, create, and play with string and string functions in Python.

If you have any doubts or feedback, feel free to share them in the comments section below.

About The Author

I am pursuing a bachelor’s in computer science. I am a data science enthusiast who loves to learn, play and work with data and data science.

Connect with me on Linkedin

Check out my other articles here and on Blogspot

Thanks for giving your time!

The media shown in this article is not owned by Analytics Vidhya and are used at the Author’s discretion

Raghav Agrawal 30 Nov 2021

I am a final year undergraduate who loves to learn and write about technology. I am a passionate learner, and a data science enthusiast. I am learning and working in data science field from past 2 years, and aspire to grow as Big data architect.

Frequently Asked Questions

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Responses From Readers

Clear