Welcome!

By registering with us, you'll be able to discuss, share and private message with other members of our community.

SignUp Now!
  • Guest, before posting your code please take these rules into consideration:
    • It is required to use our BBCode feature to display your code. While within the editor click < / > or >_ and place your code within the BB Code prompt. This helps others with finding a solution by making it easier to read and easier to copy.
    • You can also use markdown to share your code. When using markdown your code will be automatically converted to BBCode. For help with markdown check out the markdown guide.
    • Don't share a wall of code. All we want is the problem area, the code related to your issue.


    To learn more about how to use our BBCode feature, please click here.

    Thank you, Code Forum.

Python Problem with split() and rstrip()

Hello,

I have the following assignment:

Open the file **mbox-short.txt** and read it line by line. When you find a line that starts with 'From ' like the following line:
From [email protected] Sat Jan 5 09:14:16 2008
You will parse the From line using split() and print out the second word in the line (i.e. the entire address of the person who sent the message). Then print out a count at the end.
**Hint:** make sure not to include the lines that start with 'From:'. Also look at the last line of the sample output to see how to print the count.
You can download the sample data at [http://www.py4e.com/code3/mbox-short.txt]

The output should be:

[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
There were 27 lines in the file with From as the first word


I tried it with the code below:

``````````

1 fname = input("Enter file name: ")
2 if len(fname) < 1:
3 fname = "mbox-short.txt"
4
5 fh = open(fname)
6 count = 0
7 for line in fh:
8 if line.startswith('From:'):
9 pass
10 elif line.startswith('From'):
11 x = line.split('From') and line.rstrip(' SatFriThuJn0123456789:')
12 print(x)

13 count = count + 1

14 print("There were", count, "lines in the file with From as the first word")

````````````
The output I'm getting is the following:


From [email protected] Sat Jan 5 09:14:16 2008 ← Mismatch

From [email protected] Fri Jan 4 18:10:48 2008

From [email protected] Fri Jan 4 16:10:39 2008

From [email protected] Fri Jan 4 15:46:24 2008

From [email protected] Fri Jan 4 15:03:18 2008

From [email protected] Fri Jan 4 14:50:18 2008

From [email protected] Fri Jan 4 11:37:30 2008

From [email protected] Fri Jan 4 11:35:08 2008

From [email protected] Fri Jan 4 11:12:37 2008

From [email protected] Fri Jan 4 11:11:52 2008

From [email protected] Fri Jan 4 11:11:03 2008

From [email protected] Fri Jan 4 11:10:22 2008

From [email protected] Fri Jan 4 10:38:42 2008

From [email protected] Fri Jan 4 10:17:43 2008

From [email protected] Fri Jan 4 10:04:14 2008

From [email protected] Fri Jan 4 09:05:31 2008

From [email protected] Fri Jan 4 07:02:32 2008

From [email protected] Fri Jan 4 06:08:27 2008

From [email protected] Fri Jan 4 04:49:08 2008

From [email protected] Fri Jan 4 04:33:44 2008

From [email protected] Fri Jan 4 04:07:34 2008

From [email protected] Thu Jan 3 19:51:21 2008

From [email protected] Thu Jan 3 17:18:23 2008

From [email protected] Thu Jan 3 17:07:00 2008

From [email protected] Thu Jan 3 16:34:40 2008

From [email protected] Thu Jan 3 16:29:07 2008

From [email protected] Thu Jan 3 16:23:48 2008

There were 27 lines in the file with From as the first word[


As you can see, the last line is correct (count)and the email addresses are at the right order....but I'm not being able to remove "From" at the beginning of the lines neither the dates at the end of the lines. Another thing is that the lines are skipped and they shoudn't.


Could someone please help me with that task, please?

Thanks a lot.
 
Hello,

I have the following assignment:

Open the file **mbox-short.txt** and read it line by line. When you find a line that starts with 'From ' like the following line:
From [email protected] Sat Jan 5 09:14:16 2008
You will parse the From line using split() and print out the second word in the line (i.e. the entire address of the person who sent the message). Then print out a count at the end.
**Hint:** make sure not to include the lines that start with 'From:'. Also look at the last line of the sample output to see how to print the count.
You can download the sample data at [http://www.py4e.com/code3/mbox-short.txt]

The output should be:

[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
There were 27 lines in the file with From as the first word


I tried it with the code below:

``````````

1 fname = input("Enter file name: ")
2 if len(fname) < 1:
3 fname = "mbox-short.txt"
4
5 fh = open(fname)
6 count = 0
7 for line in fh:
8 if line.startswith('From:'):
9 pass
10 elif line.startswith('From'):
11 x = line.split('From') and line.rstrip(' SatFriThuJn0123456789:')
12 print(x)

13 count = count + 1

14 print("There were", count, "lines in the file with From as the first word")

````````````
The output I'm getting is the following:


From [email protected] Sat Jan 5 09:14:16 2008 ← Mismatch

From [email protected] Fri Jan 4 18:10:48 2008

From [email protected] Fri Jan 4 16:10:39 2008

From [email protected] Fri Jan 4 15:46:24 2008

From [email protected] Fri Jan 4 15:03:18 2008

From [email protected] Fri Jan 4 14:50:18 2008

From [email protected] Fri Jan 4 11:37:30 2008

From [email protected] Fri Jan 4 11:35:08 2008

From [email protected] Fri Jan 4 11:12:37 2008

From [email protected] Fri Jan 4 11:11:52 2008

From [email protected] Fri Jan 4 11:11:03 2008

From [email protected] Fri Jan 4 11:10:22 2008

From [email protected] Fri Jan 4 10:38:42 2008

From [email protected] Fri Jan 4 10:17:43 2008

From [email protected] Fri Jan 4 10:04:14 2008

From [email protected] Fri Jan 4 09:05:31 2008

From [email protected] Fri Jan 4 07:02:32 2008

From [email protected] Fri Jan 4 06:08:27 2008

From [email protected] Fri Jan 4 04:49:08 2008

From [email protected] Fri Jan 4 04:33:44 2008

From [email protected] Fri Jan 4 04:07:34 2008

From [email protected] Thu Jan 3 19:51:21 2008

From [email protected] Thu Jan 3 17:18:23 2008

From [email protected] Thu Jan 3 17:07:00 2008

From [email protected] Thu Jan 3 16:34:40 2008

From [email protected] Thu Jan 3 16:29:07 2008

From [email protected] Thu Jan 3 16:23:48 2008

There were 27 lines in the file with From as the first word[


As you can see, the last line is correct (count)and the email addresses are at the right order....but I'm not being able to remove "From" at the beginning of the lines neither the dates at the end of the lines. Another thing is that the lines are skipped and they shoudn't.


Could someone please help me with that task, please?

Thanks a lot.
Hi there,
If I may, what is the purpose of this code? Just asking out of curiosity
 
Give this a try :D

Python:
import requests

#Get the page
html = requests.get("https://www.py4e.com/code3/mbox-short.txt").text

#set counter to 0
counter = 0

#Turn all lines into a list
for line in html.split('\n'):

    #Check to see if from is in the line
    if "From" in line:
        #now turn the line into a new list and print
        print(line.split(" ")[1])
        print("---------------")

        #counting
        counter+=1

print("Total Found",counter)
 
Another way using re.findall

Python:
import requests
import re

counter = 0

# Get the page
html = requests.get("https://www.py4e.com/code3/mbox-short.txt").text

# Use regrex to get all emails
emails = re.findall(r'From:(.*)', html)

for email in emails:
    counter += 1
    print(f'{counter}. {email}')
    

print(f'There were {counter} lines in the file with From as the first word')



forgot the counter. Could use len(emails) to get a count.


Oops, Didn't see that split was required. Sorry.
 
Last edited:
Python:
import requests

# Get the page
text = requests.get("https://www.py4e.com/code3/mbox-short.txt").text

#sender is a dictionary where
#  key = address of sender
#  val = number of messages from this sender

sender = {}

for line in text.split('\n'):
   
    if line.startswith('From'):

        addr = line.split()[1]
       
        if not addr in sender:
            sender[addr] = 1
        else:
            sender[addr] += 1

total = 0

for addr,count in sender.items():
    print(f'{count:2d} emails sent by {addr}')
    total += count

print(f'\nTotal of {total} emails received')

Output:

 4 emails sent by [email protected]
 6 emails sent by [email protected]
 8 emails sent by [email protected]
 4 emails sent by [email protected]
10 emails sent by [email protected]
 6 emails sent by [email protected]
 2 emails sent by [email protected]
 2 emails sent by [email protected]
 2 emails sent by [email protected]
 8 emails sent by [email protected]
 2 emails sent by [email protected]

Total of 54 emails received
 
Back
Top Bottom