credit: Jordan Boyd-Graber
The goal of this assignment is for you to 1) implement stochastic gradient ascent for logistic regression, 2) apply it to the task of determining whether documents are talking about hockey or baseball, 3) discover discriminative covariates that are indicative of categories.
The starter code logreg.py
is available here and the tests (tests.py
) is available here. The data needed can be found in this tar.gz.
You should not use any libraries that implement any of the functionality of logistic regression for this assignment; logistic regression is implemented in scikit learn and many other places, but you should do everything by hand now. You’ll be able to use library implementations of logistic regression in the future.
In your README, make sure to answer the following questions:
Recommendation: I’d highly recommend using numpy
to help answer questions 3 and 4.
Caution: When implementing extra credit, make sure your implementation of the regular algorithms doesn’t change.
In your README, answer the following questions:
Feel free to add additional/optional feedback, e.g. what did you like or dislike about the assignment?
Submit the following files to the assignment called HW04
on Gradescope:
logreg.py
README
(this can be a pdf if you include figures, which are better than text).Make sure to name the python file exactly what we specify here. Otherwise, our autograders might not work and we might have to take points off.
read_dataset
function.Before running your code on the data, make sure your solution passes the unit tests. Below is the result of running the unit test prior to implementing stochastic gradient ascent.
[0. 0. 0. 0. 0.]
[1. 4. 3. 1. 0.]
F
======================================================================
FAIL: test_unreg (__main__.TestKnn)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/apoliak/BMC/teaching/CTA/assignments/logistic_regression/release/tests.py", line 19, in test_unreg
self.assertAlmostEqual(beta[0], .5)
AssertionError: 0.0 != 0.5 within 7 places (0.5 difference)
----------------------------------------------------------------------
Ran 1 test in 0.002s
FAILED (failures=1)
Once the tests run successfully, you should see the following printed out:
[0. 0. 0. 0. 0.]
[1. 4. 3. 1. 0.]
[0.5 2. 1.5 0.5 0. ]
[1. 4. 3. 1. 0.]
.
----------------------------------------------------------------------
Ran 1 test in 0.000s
OK
This is an example of what your runs should look like:
python logreg.py
Read in 1064 train and 133 test
Update 1 TP -728.031877 HP -89.799782 TA 0.497180 HA 0.533835
Update 6 TP -900.538122 HP -115.634421 TA 0.517857 HA 0.488722
Update 11 TP -704.708605 HP -96.961868 TA 0.579887 HA 0.518797
Update 16 TP -679.129712 HP -92.263830 TA 0.588346 HA 0.541353
Update 21 TP -722.469178 HP -89.796534 TA 0.608083 HA 0.639098
Update 26 TP -656.324879 HP -75.516536 TA 0.660714 HA 0.714286
Update 31 TP -658.569795 HP -76.169386 TA 0.654135 HA 0.691729
Update 36 TP -633.161013 HP -72.344206 TA 0.684211 HA 0.706767
Update 41 TP -701.607619 HP -80.218928 TA 0.679511 HA 0.669173
Update 46 TP -588.567683 HP -66.271052 TA 0.722744 HA 0.766917
Update 51 TP -582.332356 HP -67.719479 TA 0.719925 HA 0.714286
Update 56 TP -506.524431 HP -62.377003 TA 0.759398 HA 0.781955
Update 61 TP -513.759593 HP -63.221052 TA 0.761278 HA 0.729323
Update 66 TP -644.013282 HP -76.702718 TA 0.718985 HA 0.706767
Update 71 TP -608.637740 HP -84.982997 TA 0.737782 HA 0.706767
Update 76 TP -567.831518 HP -81.089895 TA 0.759398 HA 0.736842
Update 81 TP -529.522052 HP -77.503906 TA 0.774436 HA 0.729323
Update 86 TP -522.190913 HP -75.391605 TA 0.779135 HA 0.744361
Update 91 TP -545.689829 HP -79.972120 TA 0.763158 HA 0.736842
Update 96 TP -474.988277 HP -70.802779 TA 0.798872 HA 0.759398
Update 101 TP -477.839626 HP -70.785230 TA 0.796992 HA 0.759398
Update 106 TP -440.214535 HP -65.508160 TA 0.809211 HA 0.804511
Update 111 TP -440.076552 HP -64.801832 TA 0.798872 HA 0.729323
Update 116 TP -713.863979 HP -89.852378 TA 0.694549 HA 0.676692
Update 121 TP -535.636455 HP -74.382158 TA 0.769737 HA 0.744361
Update 126 TP -424.582210 HP -64.657032 TA 0.818609 HA 0.774436
Update 131 TP -405.196644 HP -60.364998 TA 0.825188 HA 0.759398
Update 136 TP -408.284788 HP -59.994036 TA 0.812970 HA 0.766917
Update 141 TP -385.104092 HP -55.891346 TA 0.835526 HA 0.819549
Update 146 TP -399.842426 HP -56.306737 TA 0.817669 HA 0.804511
Update 151 TP -382.629258 HP -53.963372 TA 0.836466 HA 0.804511
Update 156 TP -381.663687 HP -52.937715 TA 0.838346 HA 0.804511
Update 161 TP -394.553812 HP -53.425808 TA 0.827068 HA 0.812030
Update 166 TP -413.874440 HP -54.035435 TA 0.818609 HA 0.796992
Update 171 TP -351.797283 HP -46.774927 TA 0.854323 HA 0.864662
Update 176 TP -364.208360 HP -48.756012 TA 0.854323 HA 0.819549
Update 181 TP -344.609405 HP -47.936714 TA 0.858083 HA 0.842105
Update 186 TP -339.372977 HP -48.645582 TA 0.853383 HA 0.834586
Update 191 TP -332.315167 HP -45.289989 TA 0.874060 HA 0.864662
Update 196 TP -335.748269 HP -44.054527 TA 0.867481 HA 0.857143
Update 201 TP -330.232184 HP -43.688826 TA 0.868421 HA 0.857143
Update 206 TP -329.351848 HP -43.176223 TA 0.875000 HA 0.857143
Update 211 TP -338.508390 HP -45.469553 TA 0.869361 HA 0.849624
Update 216 TP -336.950406 HP -44.084169 TA 0.867481 HA 0.849624
Update 221 TP -330.332779 HP -44.034070 TA 0.875000 HA 0.864662
Update 226 TP -342.730943 HP -47.352928 TA 0.866541 HA 0.864662
Update 231 TP -352.033720 HP -49.484280 TA 0.859023 HA 0.842105
Update 236 TP -362.172670 HP -53.178104 TA 0.859023 HA 0.842105
Update 241 TP -342.646803 HP -50.095402 TA 0.866541 HA 0.864662
Update 246 TP -349.859715 HP -51.710507 TA 0.859962 HA 0.849624
Update 251 TP -330.852584 HP -47.375257 TA 0.871241 HA 0.849624
Update 256 TP -337.695348 HP -46.598531 TA 0.868421 HA 0.849624
Update 261 TP -327.696433 HP -47.230397 TA 0.881579 HA 0.842105
Update 266 TP -321.175652 HP -46.978466 TA 0.875940 HA 0.842105
Update 271 TP -316.744286 HP -46.907174 TA 0.875000 HA 0.849624
Update 276 TP -318.625257 HP -47.229407 TA 0.875940 HA 0.849624
Update 281 TP -317.328799 HP -47.629807 TA 0.877820 HA 0.849624
Update 286 TP -313.285825 HP -44.597210 TA 0.881579 HA 0.849624
Update 291 TP -316.658230 HP -44.702502 TA 0.875000 HA 0.849624
Update 296 TP -309.389970 HP -44.404765 TA 0.888158 HA 0.849624
Update 301 TP -312.964923 HP -45.210895 TA 0.886278 HA 0.842105
Update 306 TP -307.419423 HP -43.342098 TA 0.889098 HA 0.864662
Update 311 TP -293.251319 HP -44.257542 TA 0.894737 HA 0.827068
Update 316 TP -283.531862 HP -42.377066 TA 0.896617 HA 0.857143
Update 321 TP -295.098046 HP -43.586514 TA 0.888158 HA 0.857143
Update 326 TP -286.864337 HP -42.486508 TA 0.891917 HA 0.864662
Update 331 TP -288.522475 HP -42.915298 TA 0.893797 HA 0.857143
Update 336 TP -283.714986 HP -41.928494 TA 0.893797 HA 0.857143
Update 341 TP -285.996595 HP -40.765043 TA 0.895677 HA 0.857143
Update 346 TP -271.787967 HP -39.220396 TA 0.896617 HA 0.857143
Update 351 TP -273.281431 HP -40.019914 TA 0.895677 HA 0.849624
Update 356 TP -270.152709 HP -39.683134 TA 0.895677 HA 0.872180
Update 361 TP -270.811528 HP -39.983171 TA 0.897556 HA 0.872180
Update 366 TP -263.316336 HP -39.107378 TA 0.895677 HA 0.864662
Update 371 TP -260.932851 HP -37.912378 TA 0.903195 HA 0.864662
Update 376 TP -264.846119 HP -38.946195 TA 0.898496 HA 0.864662
Update 381 TP -255.685867 HP -37.146460 TA 0.906955 HA 0.857143
Update 386 TP -266.240217 HP -38.572019 TA 0.901316 HA 0.864662
Update 391 TP -252.600101 HP -37.053851 TA 0.906955 HA 0.864662
Update 396 TP -277.531764 HP -40.205220 TA 0.897556 HA 0.849624
Update 401 TP -299.160168 HP -42.171900 TA 0.887218 HA 0.842105
Update 406 TP -255.168044 HP -32.911552 TA 0.907895 HA 0.902256
Update 411 TP -255.919891 HP -32.992732 TA 0.907895 HA 0.909774
Update 416 TP -255.636806 HP -32.776485 TA 0.911654 HA 0.917293
Update 421 TP -249.305313 HP -32.098639 TA 0.906015 HA 0.872180
Update 426 TP -249.095167 HP -31.809436 TA 0.907895 HA 0.872180
Update 431 TP -247.576211 HP -31.154342 TA 0.913534 HA 0.894737
Update 436 TP -244.828296 HP -31.023336 TA 0.913534 HA 0.902256
Update 441 TP -244.647621 HP -31.268950 TA 0.912594 HA 0.902256
Update 446 TP -246.675548 HP -33.054302 TA 0.904135 HA 0.887218
Update 451 TP -237.280527 HP -31.704781 TA 0.917293 HA 0.909774
Update 456 TP -233.663726 HP -31.525557 TA 0.918233 HA 0.887218
Update 461 TP -233.673719 HP -31.507317 TA 0.918233 HA 0.887218
Update 466 TP -237.187700 HP -30.951204 TA 0.923872 HA 0.909774
Update 471 TP -237.968934 HP -31.225037 TA 0.922932 HA 0.909774
Update 476 TP -241.493792 HP -34.315493 TA 0.909774 HA 0.864662
Update 481 TP -234.887174 HP -32.732465 TA 0.914474 HA 0.872180
Update 486 TP -240.785778 HP -30.537650 TA 0.922932 HA 0.924812
Update 491 TP -236.276425 HP -32.770371 TA 0.917293 HA 0.894737
Update 496 TP -235.417560 HP -32.366469 TA 0.918233 HA 0.902256
Update 501 TP -240.166284 HP -33.564283 TA 0.911654 HA 0.909774
Update 506 TP -258.218223 HP -38.384819 TA 0.896617 HA 0.872180
Update 511 TP -229.093576 HP -31.213094 TA 0.916353 HA 0.902256
Update 516 TP -225.074420 HP -31.455123 TA 0.921053 HA 0.909774
Update 521 TP -224.186134 HP -31.001963 TA 0.925752 HA 0.917293
Update 526 TP -224.216961 HP -30.654317 TA 0.922932 HA 0.917293
Update 531 TP -224.760694 HP -30.463913 TA 0.921053 HA 0.917293
Update 536 TP -244.502547 HP -35.925176 TA 0.905075 HA 0.894737
Update 541 TP -216.257792 HP -29.450287 TA 0.925752 HA 0.909774
Update 546 TP -219.859926 HP -29.173745 TA 0.925752 HA 0.917293
Update 551 TP -238.202289 HP -31.547143 TA 0.918233 HA 0.924812
Update 556 TP -223.192968 HP -28.827227 TA 0.932331 HA 0.924812
Update 561 TP -229.294454 HP -30.189520 TA 0.922932 HA 0.909774
Update 566 TP -223.212876 HP -29.143003 TA 0.925752 HA 0.932331
Update 571 TP -220.508124 HP -28.311058 TA 0.930451 HA 0.917293
Update 576 TP -221.648705 HP -28.206051 TA 0.931391 HA 0.924812
Update 581 TP -218.971886 HP -27.944724 TA 0.930451 HA 0.924812
Update 586 TP -213.005446 HP -27.288646 TA 0.935150 HA 0.924812
Update 591 TP -209.871761 HP -27.600190 TA 0.936090 HA 0.924812
Update 596 TP -207.993175 HP -27.908688 TA 0.932331 HA 0.924812
Update 601 TP -219.682294 HP -29.891741 TA 0.928571 HA 0.924812
Update 606 TP -224.083987 HP -30.756422 TA 0.926692 HA 0.909774
Update 611 TP -198.293977 HP -28.263903 TA 0.939850 HA 0.932331
Update 616 TP -210.225359 HP -30.256640 TA 0.934211 HA 0.917293
Update 621 TP -189.285680 HP -26.920397 TA 0.938910 HA 0.932331
Update 626 TP -185.885916 HP -26.693071 TA 0.934211 HA 0.932331
Update 631 TP -184.754707 HP -26.536890 TA 0.936090 HA 0.924812
Update 636 TP -184.401972 HP -26.635137 TA 0.936090 HA 0.924812
Update 641 TP -243.733762 HP -41.992132 TA 0.916353 HA 0.879699
Update 646 TP -202.614916 HP -34.416480 TA 0.930451 HA 0.909774
Update 651 TP -195.117687 HP -32.646085 TA 0.936090 HA 0.909774
Update 656 TP -191.641899 HP -32.069876 TA 0.937030 HA 0.924812
Update 661 TP -197.845320 HP -34.776900 TA 0.936090 HA 0.894737
Update 666 TP -211.962363 HP -37.951163 TA 0.930451 HA 0.872180
Update 671 TP -215.861813 HP -38.958660 TA 0.928571 HA 0.872180
Update 676 TP -209.104598 HP -37.592822 TA 0.929511 HA 0.872180
Update 681 TP -200.823860 HP -35.991893 TA 0.931391 HA 0.887218
Update 686 TP -197.625074 HP -35.499161 TA 0.932331 HA 0.894737
Update 691 TP -198.688203 HP -35.909041 TA 0.934211 HA 0.879699
Update 696 TP -195.728332 HP -35.311607 TA 0.934211 HA 0.887218
Update 701 TP -184.278031 HP -32.210576 TA 0.941729 HA 0.894737
Update 706 TP -183.843677 HP -32.103085 TA 0.941729 HA 0.894737
Update 711 TP -183.431062 HP -31.912981 TA 0.942669 HA 0.894737
Update 716 TP -176.952392 HP -30.010054 TA 0.943609 HA 0.917293
Update 721 TP -177.748702 HP -29.913118 TA 0.945489 HA 0.924812
Update 726 TP -177.481537 HP -30.264986 TA 0.943609 HA 0.917293
Update 731 TP -188.138807 HP -34.021290 TA 0.945489 HA 0.917293
Update 736 TP -280.139552 HP -52.129680 TA 0.894737 HA 0.834586
Update 741 TP -201.545726 HP -38.295314 TA 0.937030 HA 0.894737
Update 746 TP -218.586819 HP -42.233102 TA 0.929511 HA 0.887218
Update 751 TP -202.794987 HP -39.780847 TA 0.938910 HA 0.887218
Update 756 TP -181.017139 HP -34.575664 TA 0.943609 HA 0.909774
Update 761 TP -170.328651 HP -30.073088 TA 0.944549 HA 0.909774
Update 766 TP -169.360699 HP -29.281683 TA 0.943609 HA 0.917293
Update 771 TP -169.122115 HP -28.121091 TA 0.951128 HA 0.924812
Update 776 TP -174.043514 HP -31.168049 TA 0.943609 HA 0.902256
Update 781 TP -168.128413 HP -24.533371 TA 0.945489 HA 0.947368
Update 786 TP -168.382129 HP -24.665370 TA 0.945489 HA 0.947368
Update 791 TP -167.152851 HP -24.577424 TA 0.948308 HA 0.947368
Update 796 TP -177.801247 HP -24.737861 TA 0.944549 HA 0.924812
Update 801 TP -168.469453 HP -23.295863 TA 0.948308 HA 0.947368
Update 806 TP -164.565087 HP -22.868267 TA 0.951128 HA 0.947368
Update 811 TP -163.537213 HP -22.877532 TA 0.953008 HA 0.947368
Update 816 TP -160.231437 HP -22.648307 TA 0.952068 HA 0.947368
Update 821 TP -157.483278 HP -22.685490 TA 0.949248 HA 0.939850
Update 826 TP -157.006715 HP -22.836286 TA 0.953008 HA 0.939850
Update 831 TP -156.877774 HP -22.625474 TA 0.953008 HA 0.939850
Update 836 TP -154.113836 HP -21.634553 TA 0.955827 HA 0.947368
Update 841 TP -153.803587 HP -22.432015 TA 0.954887 HA 0.939850
Update 846 TP -154.842451 HP -22.816448 TA 0.950188 HA 0.939850
Update 851 TP -151.456616 HP -25.947794 TA 0.960526 HA 0.917293
Update 856 TP -152.898779 HP -27.128185 TA 0.958647 HA 0.909774
Update 861 TP -151.672465 HP -26.257596 TA 0.960526 HA 0.924812
Update 866 TP -153.140256 HP -26.597070 TA 0.960526 HA 0.909774
Update 871 TP -152.001947 HP -27.514782 TA 0.960526 HA 0.924812
Update 876 TP -150.523445 HP -27.637190 TA 0.961466 HA 0.924812
Update 881 TP -149.950688 HP -27.628529 TA 0.959586 HA 0.932331
Update 886 TP -149.398130 HP -27.826662 TA 0.958647 HA 0.924812
Update 891 TP -157.314101 HP -28.664242 TA 0.953947 HA 0.932331
Update 896 TP -150.317655 HP -27.502255 TA 0.960526 HA 0.924812
Update 901 TP -147.562471 HP -28.082296 TA 0.962406 HA 0.917293
Update 906 TP -147.225982 HP -28.005794 TA 0.963346 HA 0.917293
Update 911 TP -147.406911 HP -27.964395 TA 0.961466 HA 0.917293
Update 916 TP -145.498971 HP -27.292979 TA 0.962406 HA 0.917293
Update 921 TP -159.307199 HP -28.638735 TA 0.954887 HA 0.917293
Update 926 TP -156.089930 HP -27.468569 TA 0.959586 HA 0.917293
Update 931 TP -148.359705 HP -27.892735 TA 0.965226 HA 0.924812
Update 936 TP -164.892126 HP -29.170329 TA 0.958647 HA 0.917293
Update 941 TP -161.556264 HP -28.584273 TA 0.959586 HA 0.917293
Update 946 TP -157.434058 HP -27.941456 TA 0.960526 HA 0.917293
Update 951 TP -140.535551 HP -23.529732 TA 0.966165 HA 0.954887
Update 956 TP -150.600977 HP -29.336520 TA 0.962406 HA 0.924812
Update 961 TP -146.216163 HP -27.631492 TA 0.964286 HA 0.917293
Update 966 TP -146.450307 HP -27.202680 TA 0.965226 HA 0.924812
Update 971 TP -143.741654 HP -26.406012 TA 0.964286 HA 0.932331
Update 976 TP -142.306727 HP -26.397807 TA 0.962406 HA 0.924812
Update 981 TP -146.722993 HP -27.991503 TA 0.965226 HA 0.924812
Update 986 TP -143.086637 HP -27.803336 TA 0.965226 HA 0.917293
Update 991 TP -144.620445 HP -28.091146 TA 0.964286 HA 0.924812
Update 996 TP -140.363950 HP -28.288579 TA 0.963346 HA 0.917293
Update 1001 TP -139.373096 HP -28.215446 TA 0.965226 HA 0.917293
Update 1006 TP -138.083490 HP -27.983285 TA 0.965226 HA 0.917293
Update 1011 TP -137.176496 HP -29.208996 TA 0.962406 HA 0.909774
Update 1016 TP -136.126425 HP -29.141944 TA 0.962406 HA 0.909774
Update 1021 TP -130.769707 HP -26.930035 TA 0.967105 HA 0.924812
Update 1026 TP -129.979771 HP -26.882370 TA 0.965226 HA 0.917293
Update 1031 TP -129.148996 HP -26.717335 TA 0.965226 HA 0.917293
Update 1036 TP -126.848435 HP -25.697901 TA 0.970865 HA 0.932331
Update 1041 TP -125.353078 HP -25.143287 TA 0.973684 HA 0.932331
Update 1046 TP -125.650765 HP -25.450108 TA 0.969925 HA 0.939850
Update 1051 TP -125.489756 HP -25.138841 TA 0.974624 HA 0.939850
Update 1056 TP -126.303435 HP -24.990931 TA 0.971805 HA 0.939850
Update 1061 TP -123.245353 HP -24.924609 TA 0.974624 HA 0.947368
Update 1064 TP -123.245353 HP -24.924609 TA 0.974624 HA 0.947368