Christophe@pallier.org
Sept. 2013
In Python, text can be stored in objects of type ‘str’ (a.k.a as ‘strings’)
String constants are enclosed between single or double quotes
'bonjour'
"bonjour Paris!"
"""hello
ceci est un text
sur plusieurs lignes
"""
type('123')
type(123)
123 + 456
'123' + '456'
int('123') # converting str into int
str(1 + 1) # converting int into str
mystring = 'superman'
len(mystring)
mystring[0]
mystring[1]
mystring[1:5]
for letter in mystring:
print(letter)
A set of functions to manipulate strings is available in the module ‘string’ (see https://docs.python.org/2/library/string.html). Among others, you should know about:
name = raw_input('Comment vous appelez-vous ? ')
print "Bonjour " + name + '!'
Create a text file ’essa
```python
writing:
filename = 'test.txt'
handle = open(filename, 'w')
handle.write('welcome')
handle.write('to the wonderful')
handle.write('world of Python!')
handle.close()
Download Alice in Wonderland.
import string
def remove_punctuation(text):
punct = string.punctuation + chr(10)
return text.translate(string.maketrans(punct, " " * len(punct)))
textori = file('alice.txt').read().lower()
text = remove_punctuation(textori)
words = text.split()
print(words)
Now write a script that counts the number of occurences of ‘Alice’, ‘Rabbit’ or ‘office’ in the list of words.
You can skim through http://matplotlib.org/users/pyplot_tutorial.html.
Remark: The product rank X frequency is roughly constant. This ‘law’ was discovered by Estoup and popularized by Zipf. See http://en.wikipedia.org/wiki/Zipf%27s_law.
xI -> xIU
Mx -> Mxx
xIIIy -> xUy
xUUy -> xy
(Tip: use the function string.replace)